Originally published June 18th, 2012 on OpenRCE.

The goal of my RECON 2012 keynote speech was to introduce methods in academic program analysis and demonstrate -- intuitively, without drawing too much on formalism -- how they can be used to solve practical problems that are interesting to industrial researchers in the real world. Given that it was the keynote speech, and my goal of making the material as accessible as possible, I attempted to make my points with pictures instead of dense technical explanations. As a result, one might consider this presentation to be a friendly (but decidedly incomplete) introduction to binary program analysis as opposed to a rigorous mathematical monograph. The presentation features five detailed expositions of applying static program analysis (abstract interpretation and SMT solving) towards practically-interesting reverse engineering problems. (Aside: it's quite challenging to present this material without using terms such as "lattice", "Galois connection", etc.!)

Unfortunately, due to an error with the camera, the recording of the talk does not exist. This is problematic: I failed somewhat in walking the sharp edge of Einstein's razor, "as simple as possible, but no simpler" -- it was in fact made simpler than what was possible, and some important details (for example, about relational abstract interpretation and reduced products) were included in the spoken material but not the actual slides. Therefore, the learned reader is advised to imagine judiciously-placed asterisks and the accompanying errata, and the untutored pupil would be well-advised to recognize the incomplete and intuitive nature of the exposition and perhaps consult this program analysis reading list.

~~I would like to give the talk at some other conference at which the video can be reliably recorded, so that it may be published online.~~

Here are the slides.

Originally published on April 4th, 2008. This post won Honorable Mention for Most Innovative Research at the Pwnie Awards 2008.

There are two types of virtual machine software protections:  A) the ones that convert x86 machine code into virtual machine bytecode and execute it at runtime; B) the ones that execute some arbitrary code in a virtual environment.  I've discussed the latter several times in the past, and by now there exists a wealth of literature on that variety.  But breaking the former kind remains an unsolved problem.

In my article I said "basically, reverse engineering a VM with the common tools is like reverse engineering a scripted installer without a script decompiler: it's repetitious, and the high-level details are obscured by the flood of low-level details".  The more I thought about this, the more I realized that the word "basically" is out of place:  virtualizing software protections are programming language interpreters, albeit for weird languages.

Consequently, an idea struck me:  what we want here is not an interpreter, but a compiler to compile the bytecode back into x86 machine code.  I spent a week coding one (~1000 lines) in OCaml to test this theory, and I'm able to report that, indeed, it works.  I chose ReWolf's x86 Virtualizer, a simple target that uses some of the same techniques as the heavy hitters in this area.  Here is a walkthrough of the analysis and recompilation of a small function with one basic block.  The compiler works equally well for arbitrarily-large functions, although that would make this posting unnecessarily long and complicated.

Step -2:  Protect something with the virtualizer.  In this case I just used ReWolf's sample executable itself.

.text:00401896 call ds:GetTickCount
.text:0040189C push eax
.text:0040189D call _srand
.text:004018A2 pop ecx
.text:004018A3 push 0
.text:004018A5 push offset DialogFunc
.text:004018AA push 0
.text:004018AC push 65h
.text:004018AE push [esp+10h+hInstance]
.text:004018B2 call ds:DialogBoxParamA
.text:004018B8 xor eax, eax
.text:004018BA retn 10h

Step -1: Analyze the virtual machine. Although this was not strictly necessary in this case because ReWolf provided source code, I decided to ignore it and reverse the VM manually, since you don't always have such niceties.

Step 0: Break the polymorphism in the instruction set. I made use of two remarkably ghetto hacks here, one of which may be considered elegant. To avoid provoking any arms races I'll omit the details.

Step 1: Disassemble the relevant region into VM bytecode. In the process, construct a graph in which each vertex is an instruction, and the edges are the flows between them.

.VM:004131D0 db 0C2h, 0C9h, 0C0h, 0BDh, 14h, 0DFh, 63h, 9Ah, 86h, 5Eh, 50h, 30h, 0Bh
.VM:004131D0 db 0Ah, 0C0h, 0C7h, 0CEh, 5Eh, 44h, 0E1h, 0E0h, 0C7h, 0FCh, 0FDh, 12h
.VM:004131D0 db 10h, 50h, 0D8h, 0D2h, 0DBh, 0A6h, 3Dh, 34h, 0C9h, 12h, 0DEh, 0E5h, 4Bh
.VM:004131D0 db 2Ch, 2Eh, 6Eh, 23h, 21h, 27h, 0E2h, 0E5h, 0ECh, 99h, 14h, 13h, 0C2h
.VM:004131D0 db 0E5h, 0F9h, 0FDh, 0F4h, 38h, 14h, 0F7h, 0F0h, 0F9h, 0ABh, 79h, 6, 0D7h
.VM:004131D0 db 0F0h, 8Bh, 88h, 81h, 41h, 87h, 8Ch, 85h, 0F8h, 51h, 9Ah, 26h, 0DFh
.VM:004131D0 db 0CFh, 1Eh, 15h, 75h, 76h, 74h, 6Bh, 98h, 9Dh, 94h, 6Eh, 0Ch, 6Bh, 90h
.VM:004131D0 db 93h, 9Ah, 0Fh

becomes

vertexlist =
[{label = 84; instruction = VMExit 16l};
{label = 81; instruction = LiteralInstruction [|51; 192|]};
{label = 69; instruction = ImagebaseFixupInstruction ([|255; 21; 72; 161; 0; 0|], 2l)};
{label = 65; instruction = PushDereferencedTemp};
{label = 57; instruction = AddImmediateToTemp 20l};
{label = 52; instruction = AddRegisterToTemp Esp};
{label = 44; instruction = SetTemp 0l};
{label = 41; instruction = LiteralInstruction [|106; 101|]};
{label = 38; instruction = LiteralInstruction [|106; 0|]};
{label = 27; instruction = ImagebaseFixupInstruction ([|104; 240; 22; 0; 0|], 1l)};
{label = 24; instruction = LiteralInstruction [|106; 0|]};
{label = 22; instruction = LiteralInstruction [|89|]};
{label = 14; instruction = X86Call 6471l};
{label = 12; instruction = LiteralInstruction [|80|]};
{label = 0;instruction = ImagebaseFixupInstruction ([|255; 21; 40; 160; 0; 0|], 2l)}];
edgelist =
[({contents = {label = 0}},{contents = {label = 12}});
({contents = {label = 12}}, {contents = {label = 14}});
({contents = {label = 14}}, {contents = {label = 22}});
({contents = {label = 22}}, {contents = {label = 24}});
(* Lots and lots of edges removed *)]

Step 2: Form basic blocks within the instruction-level CFG. The previous output becomes:

vertexlist =
[{label = 0;
 instruction =
[|ImagebaseFixupInstruction ([|255; 21; 40; 160; 0; 0|], 2l);
LiteralInstruction [|80|]; 
X86Call 6471l; 
LiteralInstruction [|89|];
LiteralInstruction [|106; 0|];
ImagebaseFixupInstruction ([|104; 240; 22; 0; 0|], 1l);
LiteralInstruction [|106; 0|]; 
LiteralInstruction [|106; 101|];
SetTemp 0l; 
AddRegisterToTemp Esp; 
AddImmediateToTemp 20l;
PushDereferencedTemp;
ImagebaseFixupInstruction ([|255; 21; 72; 161; 0; 0|], 2l);
LiteralInstruction [|51; 192|]; VMExit 16l|]}];

Step 3: Optimize the code within the basic block. The goal is to convert sequences of VM instructions into a new language more conducive to being compiled back into X86. The optimizer is the most powerful component of my compiler: it can remove obfuscation automatically simply as a side-effect of being an optimizer (not that ReWolf's has any, but others do), and employs no pattern matching.

vertexlist =
[{label = 0;
 instruction =
[|ImagebaseFixupInstruction ([|255; 21; 40; 160; 0; 0|], 2l);
LiteralInstruction [|80|]; 
X86Call 6471l;
LiteralInstruction [|89|];
LiteralInstruction [|106; 0|];
ImagebaseFixupInstruction ([|104; 240; 22; 0; 0|], 1l);
LiteralInstruction [|106; 0|]; 
LiteralInstruction [|106; 101|];
SyntheticInstruction (Push, Plus (Constant 20l, Register Esp));
ImagebaseFixupInstruction ([|255; 21; 72; 161; 0; 0|], 2l);
LiteralInstruction [|51; 192|]; 
VMExit 16l|]}];

Step 4: Recompile all virtual instructions into x86 machine language.

vertexlist =
[{label = 0;
 instruction =
[|ImagebaseFixupInstruction ([|255; 21; 40; 160; 0; 0|], 2l);
LiteralInstruction [|80|];
RelativeFixupInstruction ([|232; 0; 0; 0; 0|], 6471l, 1l);
LiteralInstruction [|89|]; 
LiteralInstruction [|106; 0|];
ImagebaseFixupInstruction ([|104; 240; 22; 0; 0|], 1l);
LiteralInstruction [|106; 0|]; 
LiteralInstruction [|106; 101|];
LiteralInstruction [|255; 116; 36; 20|];
ImagebaseFixupInstruction ([|255; 21; 72; 161; 0; 0|], 2l);
LiteralInstruction [|51; 192|];
LiteralInstruction [|194; 16; 0|]|]}];

Step 5: Stuff the original bytes back into the binary and perform fixups specified. If you can convert between hex and decimal in your head, you'll notice that the bytes above correspond to those below, modulo fixups. For multi-basic-block functions, this is harder, as you have to sequence the blocks and decide between short and long jumps.

.VM:004131D0 FF 15 28 A0 40 00 call ds:GetTickCount
.VM:004131D6 50                push eax
.VM:004131D7 E8 6B E7 FE FF    call loc_401947
.VM:004131DC 59                pop  ecx
.VM:004131DD 6A 00             push 0
.VM:004131DF 68 F0 16 40 00    push offset loc_4016F0
.VM:004131E4 6A 00             push 0
.VM:004131E6 6A 65             push 65h
.VM:004131E8 FF 74 24 14       push dword ptr [esp+14h]
.VM:004131EC FF 15 48 A1 40 00 call ds:DialogBoxParamA
.VM:004131F2 33 C0             xor eax, eax
.VM:004131F4 C2 10 00          retn 10h

Step 6: Celebrate. ReWolf's X86 Virtualizer was simple, and surely breaking the harder ones is, well, harder, but I believe that the general principles espoused here should be applicable to the others.

Here is the source code.