SMT-BASED BINARY PROGRAM ANALYSIS FROM THE GROUND UP

AVAILABILITY

Private on-site courses also available.  Please contact training@msreverseengineering.com for more information.

Course Description

This five-day course is meant to imbue students with a thorough understanding of all concepts and technologies involved in SMT-based binary program analysis.  It provides a bridge from low-level programming topics, through the relevant concepts in programming language theory and computational logic, into the pasture of actually applying SMT solvers to analyze programs.

Beginners to this discipline often find the academic literature unapproachable due to densely-written papers containing terse mathematical formalism.  In contrast, this course assumes no knowledge of mathematics on the part of the student. Rather than "dumbing down" the material (that is, giving it a facile treatment by removing essential complexity), we patiently explain all relevant concepts in a language that is familiar to the students.

The course walks the students through all relevant concepts and technologies by having them participate in the from-scratch construction of an SMT-based binary analysis framework, which includes SAT and SMT solvers, an x86 disassembler and assembler, the components of an Intermediate Representation (IR) framework, and an x86-to-IR translator.  After a crash course on how to write programs that analyze other programs, students shall be provided with a version of the Pandemic binary program analysis framework in which the uninteresting boilerplate components are already completed.  The programming exercises involving coding the more interesting parts of the framework components.  These exercises are supplemented by thorough test suites, resulting in a high-quality framework, as well as allowing students to ensure that their solutions are accurate in real-time.

Simultaneously, students will be introduced to SAT and SMT solver technologies.  The course provides dozens of exercises involving practical usage of Z3 through its Python and SMT-LIB2 interfaces.  Example applications include input generation (producing inputs to code sequences that cause desired outcomes), formal verification (proving that code works as intended), and equivalence checking (proving that different code sequences always behave the same way).

The final part of the course blends the programming and SMT elements.  We demonstrate a wealth of practically-interesting applications of SMT solvers to binary analysis.  See the syllabus below for more details.

Students shall receive printed copies of the lecture slides, an implementation manual, an exercise manual, as well as the Pandemic binary program analysis framework and its thorough HTML documentation.

PREREQUISITES

Students are assumed to have a low-level background (i.e., knowledge of an assembly language and/or low-level C programming), and be comfortable programming in Python.

SAMPLES

A sampling of slides from the entire course are shown here.  They should be consulted while reviewing the syllabus below.  Additionally, the entirety of the X86 material -- the slides, code, documentation, implementation manual, and exercise manual -- is available here.  (This sample also contains the introductory Python review and attendant exercises.)

Course SYLLABUS

  1. Introduction
    • Basics of Analyzing Programs
      • This unit is intended to warm the student up to the types of programming that we will be doing throughout the course.  It describes basic techniques and concepts in analyzing programs.  We thoroughly examine a small programming language and write a suite of tools for analyzing and manipulating it, culminating in a tiny compiler for it.
        • Programming language structure
        • Representing computer programs with data structures
        • Lexing and parsing
        • How program analysis algorithms are structured
        • Pattern-matching notation
        • Evaluation
        • Type-Checking
        • Language translation
  2. SAT and SMT Solvers
    • Propositional Logic
      • This module informs students that they are already familiar with propositional logic from their previous programming experience.  We discuss propositional formulae and what it means to solve a formula.  We write a number of tools for interacting with propositional formulae, such as an evaluator, truth-table printer, and a simple SAT solver.
    • Applications of SAT Solvers
      • The rubber hits the road as students are introduced to modeling problems via propositional logic and obtaining solutions to them using a SAT solver.  Students complete a number of practical exercises using SAT solvers.
    • Introduction to SMT Solvers
      • Students are introduced to the basic concepts behind SMT solvers:  what they are, what they do, and a glimpse at how they are used in practice.
    • Bit-Vector Logic
      • For the purposes of analyzing software, one of the most important elements of an SMT solver is its support for so-called "bit-vector logic", which allows us to model the integer operations within computer programs.  This module  provides a thorough treatment of how SMT solvers solve formulae involving bit-vectors.  Programming exercises involve implementing an SMT solver.
    • Array Logic
      • This unit introduces students to the second-most important logic used in the analysis of programs, namely the SMT array theory, which is used to model memory.
    • Applications of SMT Solvers
      • We begin with a few warm-up applications involving modeling and solving puzzle-type problems using SMT solvers.  Next, we dive into the mechanics of how SMT solvers may be used to analyze code.  After covering a few necessary topics (such as SSA form, and the concepts of preconditions and postconditions), we demonstrate a number of applications of SMT solvers to program analysis.  The students are given the chance to master their new-found understanding with more than three dozen exercises applying SMT solvers.
        • Formal verification (i.e., proving that software behaves according to a specification).
        • Input generation (i.e., finding inputs that cause some desired behavior to occur, such as in automated exploit generation).
        • Equivalence checking (i.e., proving that two implementations of the same algorithm behave identically).
    • DPLL SAT Solvers
      • This optional module, to be covered if time permits, peeks under the hood of modern SAT solvers, with an eye toward how they have become so capable in modern times.  Programming exercises ask the student to fill in parts of a skeleton DPLL SAT solver.
  3. X86 Assembly
    • This unit exhaustively covers 32-bit X86 machine code.  We discuss how instructions are encoded and decoded.  Hand-written exercises see the students through the encoding and decoding process.  Programming exercises have students complete a suite of tools involving X86 machine code (an assembler, disassembler, and type-checker).
  4. Intermediate Representations
    • Overview
      • This module introduces students to intermediate representations.  It begins by tracing the origins of IRs to compiler theory, discusses their utility, and then explains how they are useful in binary analysis.  We discuss design decisions in creating an IR before detailing the IR used by our framework.
    • IR Framework Components
      • This module previews the components required by any IR-based binary analysis platform.
        • Lexer and Parser
        • Type-Checker
        • X86-to-IR Translator
        • Interpreter
        • SSA Converter
        • IR-to-SMT Integration
    • IR Translator Synthesis
      • Creating an IR translator for a given processor can be a very tedious process.  We explore an approach based upon program synthesis that allows a human to discover the functionality of instructions in a semi-automated fashion.  Lecture material covers the design of the system; programming exercises see the students implement components of the IR framework, and command-line exercises see the students discover the functionality of instructions using the synthesis tool they helped construct.
    • X86-Specific Considerations
      • X86 has a few quirks that must be taken into account during binary analysis.  This module details them and their solutions.
  5. SMT-Based Binary Program Analysis
    • The capstone of the course presents a smorgasbord of SMT-based binary program analysis applications.  We cover both straight-line and branching program analyses.  Applications are most often accompanied by code, and student exercises involve solving real-world reverse engineering problems making use of the framework we have built throughout the course.
      • Input Generation
      • Formal Verification
      • Equivalence Checking
      • Memory Access Prediction
      • Semantics-Based ROP Gadget Discovery
      • Quantitative Influence Calculation
      • Deobfuscation of Pattern-Based Obfuscators
      • Detection of Opaque Predicates
      • Symbolic Program Synthesis
      • Symbolic Execution
      • Concolic Execution
      • Increasing Code Coverage
      • White-Box Fuzzing
      • Deviation Detection

STUDENT TESTIMONIALS

Student #1

The course certainly fulfilled my expectations, which were originally to gain a more fundamental understanding of how SMT solvers work and how I might be able to apply them to my research. Having done the course, I feel that with some subsequent review of the material provided, I'll be able to both better understand how existing projects I've worked on have used SMT solvers (which were previously a black-box to me) as well as identify new applications for them. The provided Python framework will prove valuable in inspiring further research in the area, I'm sure.

The materials were very helpful - the slides and other written materials were comprehensive but also appropriately succinct. The instructor was obviously extremely knowledgeable in the fields of program analysis and reverse engineering, and was friendly and approachable. His main goal was obviously to impart his knowledge to the students in the most effective way possible.

I'd strongly recommend Rolf's course to anyone who has a background in binary reverse engineering and is interested in applying innovative new tools, methods and paradigms to their craft.

STUDENT #2

Rolf's Program Analysis course is a fantastic and challenging introduction into the applied use of mathematical tools to help reverse engineers and vulnerability researchers analyse and understand code at its lowest levels. It is a powerful first step towards making higher mathematics one of the first and most useful skills in the analyst's toolbox.

Student #3

I thought the course was great! Despite having some (minimal) prior experience with SMT solvers through using symbolic execution tools, this course was terrific in building my understanding of how the underlying ideas and tools worked without going too much into the nitty-gritty mathematics. There were loads of exercises that allowed us to get our hands dirty, and the material was very well organised.

Student #4

I enjoyed the coherent overview of how all these different elements of Program Analysis came together. I felt like I could see the "big picture" come together. The coding exercises were instructive without being overbearing. The code was clear well-written and easy to work with.

The course materials were comprehensible overall.  (Which is really good considering that I feel a less well-structured course could have been really overwhelming). There's a lot of theory that program analysis builds on and the break-up between lecturing and exercises helped us digest the material we'd learned. Having a copy of the code and slides will make it easy to revisit what we learnt should I need to recall anything.

Rolf's deep knowledge in the field of program analysis extends to both the theoretical and the practical aspects, as he shows you how to build a binary program analyser from scratch. He expresses insight and experience in a range of program analysis domains, from the abstract concepts of logic languages to the subtleties of x86 processors.

I'd recommend Rolf's course as a first step into program analysis to anyone with a basic background in discrete mathematics and computer science, and an interest in program analysis. It introduces the fundamentals of the field simply and efficiently, building up a coherent picture of the elements of program analysis like a jigsaw, accumulating a toolbox.

Student #5

The course covered everything that I was expecting and more. Both theory and application were covered in a way that makes it useful in practical reverse engineering.  The material was in depth and the x86 ISA information was a useful bonus.

The course materials were excellent. The slides and exercises complimented each other very well.

I very much enjoyed [the instructor's] presentation style. The conversational style made it easy to follow and understand the complex information being presented. It is obvious that [the instructor has] extensive knowledge in this area and is able to present it in an engaging and comprehensible way.

I would highly recommend this course to anyone looking to improve their reverse engineering knowledge.

Student #6

I learned heaps about everything I expected to and more. The course crammed in a lot of the necessary foundational background at a pace just slow enough to keep up with, and practical coding exercises along the way that kept it all grounded and relevant. The diversity of applications was a lot higher than I expected, and it was all presented in an elegant and cohesive package.

I thought the loosely structured discussion at the end where you threw in a bunch of conference presentations added a lot of valuable context showing how SMT fits into the program analysis ecosystem as a whole.

It was an easily accessible, comprehensive, practical and in-depth, introduction to the field.  [The instructor provided] good clear answers to all the curly questions, and a seemingly endless supply of references to relevant work in the field.

Student #7

Rolf's course provided great insight into this complex field, with hands on exercises which helped press home the techniques and included examples of real applications.  With Rolf's depth of understanding came clear explanations of RE applications of the underlying theory. Head exploding stuff.  This is a great introduction to this evolving area. With Rolf's experience the theory jumped into real world applications.

Student #8

I thoroughly enjoyed Rolf's "SMT-Based Binary Analysis from the Ground-Up" course. Rolf did a great job explaining quite complicated topics in an accessible fashion. The practical exercises included in the course consolidated the theoretical components and I would recommend this course to anyone interested in automated reverse engineering research and/or applications.

The exercises and course notes were very detailed and comprehensible. The notes will be a valuable resource for further study in the field.

Student #9

I was looking for a good overview of the area and to understand how SMT solvers can be utilized for practical applications.  The programming examples were mostly well chosen.  The lecture material was well written.  This course is a good course for understanding SMT solvers. It provides explanatory lecture material and programming exercises that help with understanding the concepts.