BINARY LITERACY:  SYSTEMATIC STATIC REVERSE ENGINEERING

This five-day course, battle tested after having been taught more than two dozen times, focuses on the static aspects of reverse engineering.  Namely, the act of deriving meaning from assembly code simply by reading it. The material has been updated to increase focus upon C++, and has been re-written from scratch in a modern typesetting package.

We target programs written in unobfuscated C and/or x86 assembly language. The target audience is those who primarily employ dynamic reverse engineering, and/or those who are more comfortable with Hex-Rays than an ordinary disassembly listing. Students need not be expert C programmers, but should be comfortable with the basics of C, most importantly pointers. Similarly, students need not be experts at x86 assembly language, but should be somewhat familiar with it.

Here is a sample of the course material:  Compiler Optimizations (PPT format).

COURSE DESCRIPTION

As the title implies, this course is about analyzing software systems without executing them, as though one was reading a novel. Starting from the basic letters (assembly language instructions), words (basic blocks) are constructed; from there sentences (functions) may be put together. These are organized into paragraphs (modules) which, taken together, form the bulk of chapters (executable objects). Finally, a collection of chapters makes up a book (software system).

This class splits reverse engineering into two halves: understanding compiled assembly language in terms of the original, high-level C; and comprehending assembly and C code with no comments or debug information.

After a brief refresher in x86 assembly language, the course begins by systematically examining the process of compiling C code into assembly language, and how to manually decompile assembly language back into C. Prior experience teaching this course shows that this gives students a good grounding in reading assembly language. In particular, we examine the following topics, each with relevant exercises from real-world binaries:

  • How concepts regarding C functions are implemented in compiled assembly language
  • How types in C are manifested in assembly language, which does not have a type system
  • How C expressions and statements are compiled, and manual decompilation thereof
  • Conditionals (and compound conditionals) in C, their compilation into assembly, and their decompilation
  • Control-flow statements in C and their compilation and decompilation
  • Structures in C, and the challenges they pose reverse engineering

The first half also covers:

  • Linkers and loaders
  • Compiler optimizations, an advanced topic, in great depth
  • Time permitting, we may also discuss the basics of C++ reverse engineering (calling conventions, allocation scopes, constructors and destructors, and polymorphism)

Understanding the structure of a sentence is not enough to understand its actual meaning, or that understanding one sentence is not enough to understand a paragraph, etc. Decompilation is therefore not enough: the human analyst needs techniques to comprehend the code that he or she is seeing. Thus, the second half proceeds with techniques to derive semantic meaning from assembly code. 

In particular, the class discusses a systematic and complete process for binary comprehension, conveyed through lectures, exercises, and hands-on sessions reverse engineering malware in IDA. (It should be stressed that this is not a course on malware specifically: this is a course on reverse engineering in general, and its techniques are applicable to all sub-fields thereof -- malware, security, interoperability.)

CLASS REQUIREMENTS

  • A laptop with IDA Pro installed on it (any recent version will do, on any of its supported operating systems)
  • Basic knowledge of C programming, especially pointers
  • Exposure to x86 assembly language is assumed (it will be briefly reviewed, but not treated in depth)
  • A firm grasp of the English language

TESTIMONIALS

"I've taken a bunch of trainings and [Binary Literacy] is in my top three.  It sparked my interest in becoming better at static analysis and reasoning about programs.  It was a fantastic training that I'd recommend to anyone."

"Rolf's Binary Literacy course was excellent and easily the best training course I have taken. The course helped set me on my way in reverse engineering as a career path. The structure and style of the course was well done and it helped me "connect the dots" with many things that were not clear to me prior to the course and has helped me achieve a senior level position just over 2 years after taking the course."