Control Flow Deobfuscation via Abstract Interpretation

Work done May 30th, 2010.  Originally published on July 9th, 2011 on OpenRCE.

I present some work that I did involving automatic deobfuscation of obfuscated control flow constructs with abstract interpretation.  Considering the image below, this project is responsible for taking graphs like the one on the left (where most of the "conditional" branches actually only go in one direction and are only present to thwart static analysis) and converting them into graphs like the one on the right.

Much work on deobfuscation relies on pattern-matching at least to some extent; I have coded such tools myself.  I have some distaste for such methods, since they stop working when the patterns change (they are "syntactic").  I prefer to code my deobfuscation tools as generically ("semantically") as possible, such that they capture innate properties of the obfuscation method in question, rather than hard-coding individual instances of the obfuscation.

The slides present a technique based on abstract interpretation, a form of static program analysis, for deobfuscating control flow transfers.  I translate the x86 code into a different ("intermediate") language, and then perform an analysis based on three-valued logic over the translated code.  The end result is that certain classes of opaque predicates (conditional jumps that are either always taken or always not taken) are detected and resolved.  I have successfully used this technique to break several protections making use of similar obfuscation techniques.  

Although I invented and implemented these techniques independently, given the wealth of work in program analysis, it wouldn't surprise me to learn that the particular technique has been previously invented.  Proper references are appreciated.

Code is also included.  The source relies upon my Pandemic program analysis framework, which is not publicly available.  Hence, the code is for educational purposes only.  Nonetheless, I believe it is one of very few examples of publicly-available source code involving abstract interpretation on binaries.

PPTX presentation, OCaml source code (for educational purposes only -- does not include my framework.)