Compiler 1, X86 Virtualizer 0

Originally published on April 4th, 2008.  This post won Honorable Mention for Most Innovative Research at the Pwnie Awards 2008.

There are two types of virtual machine software protections:  A) the ones that convert x86 machine code into virtual machine bytecode and execute it at runtime; B) the ones that execute some arbitrary code in a virtual environment.  I've discussed the latter several times in the past, and by now there exists a wealth of literature on that variety.  But breaking the former kind remains an unsolved problem.

In my article I said "basically, reverse engineering a VM with the common tools is like reverse engineering a scripted installer without a script decompiler: it's repetitious, and the high-level details are obscured by the flood of low-level details".  The more I thought about this, the more I realized that the word "basically" is out of place:  virtualizing software protections are programming language interpreters, albeit for weird languages.

Consequently, an idea struck me:  what we want here is not an interpreter, but a compiler to compile the bytecode back into x86 machine code.  I spent a week coding one (~1000 lines) in OCaml to test this theory, and I'm able to report that, indeed, it works.  I chose ReWolf's x86 Virtualizer, a simple target that uses some of the same techniques as the heavy hitters in this area.  Here is a walkthrough of the analysis and recompilation of a small function with one basic block.  The compiler works equally well for arbitrarily-large functions, although that would make this posting unnecessarily long and complicated.

Step -2:  Protect something with the virtualizer.  In this case I just used ReWolf's sample executable itself.

.text:00401896 call ds:GetTickCount
.text:0040189C push eax
.text:0040189D call _srand
.text:004018A2 pop ecx
.text:004018A3 push 0
.text:004018A5 push offset DialogFunc
.text:004018AA push 0
.text:004018AC push 65h
.text:004018AE push [esp+10h+hInstance]
.text:004018B2 call ds:DialogBoxParamA
.text:004018B8 xor eax, eax
.text:004018BA retn 10h

Step -1:  Analyze the virtual machine.  Although this was not strictly necessary in this case because ReWolf provided source code, I decided to ignore it and reverse the VM manually, since you don't always have such niceties.


Step 0:  Break the polymorphism in the instruction set.  I made use of two remarkably ghetto hacks here, one of which may be considered elegant.  To avoid provoking any arms races I'll omit the details.

Step 1:  Disassemble the relevant region into VM bytecode.  In the process, construct a graph in which each vertex is an instruction, and the edges are the flows between them.

.VM:004131D0 db 0C2h, 0C9h, 0C0h, 0BDh, 14h, 0DFh, 63h, 9Ah, 86h, 5Eh, 50h, 30h, 0Bh
.VM:004131D0 db 0Ah, 0C0h, 0C7h, 0CEh, 5Eh, 44h, 0E1h, 0E0h, 0C7h, 0FCh, 0FDh, 12h
.VM:004131D0 db 10h, 50h, 0D8h, 0D2h, 0DBh, 0A6h, 3Dh, 34h, 0C9h, 12h, 0DEh, 0E5h, 4Bh
.VM:004131D0 db 2Ch, 2Eh, 6Eh, 23h, 21h, 27h, 0E2h, 0E5h, 0ECh, 99h, 14h, 13h, 0C2h
.VM:004131D0 db 0E5h, 0F9h, 0FDh, 0F4h, 38h, 14h, 0F7h, 0F0h, 0F9h, 0ABh, 79h, 6, 0D7h
.VM:004131D0 db 0F0h, 8Bh, 88h, 81h, 41h, 87h, 8Ch, 85h, 0F8h, 51h, 9Ah, 26h, 0DFh
.VM:004131D0 db 0CFh, 1Eh, 15h, 75h, 76h, 74h, 6Bh, 98h, 9Dh, 94h, 6Eh, 0Ch, 6Bh, 90h
.VM:004131D0 db 93h, 9Ah, 0Fh

becomes

vertexlist =
[{label = 84; instruction = VMExit 16l};
{label = 81; instruction = LiteralInstruction [|51; 192|]};
{label = 69; instruction = ImagebaseFixupInstruction ([|255; 21; 72; 161; 0; 0|], 2l)};
{label = 65; instruction = PushDereferencedTemp};
{label = 57; instruction = AddImmediateToTemp 20l};
{label = 52; instruction = AddRegisterToTemp Esp};
{label = 44; instruction = SetTemp 0l};
{label = 41; instruction = LiteralInstruction [|106; 101|]};
{label = 38; instruction = LiteralInstruction [|106; 0|]};
{label = 27; instruction = ImagebaseFixupInstruction ([|104; 240; 22; 0; 0|], 1l)};
{label = 24; instruction = LiteralInstruction [|106; 0|]};
{label = 22; instruction = LiteralInstruction [|89|]};
{label = 14; instruction = X86Call 6471l};
{label = 12; instruction = LiteralInstruction [|80|]};
{label = 0;instruction = ImagebaseFixupInstruction ([|255; 21; 40; 160; 0; 0|], 2l)}];
edgelist =
[({contents = {label = 0}},{contents = {label = 12}});
({contents = {label = 12}}, {contents = {label = 14}});
({contents = {label = 14}}, {contents = {label = 22}});
({contents = {label = 22}}, {contents = {label = 24}});
(* Lots and lots of edges removed *)]

Step 2:  Form basic blocks within the instruction-level CFG.  The previous output becomes:

vertexlist =
[{label = 0;
 instruction =
[|ImagebaseFixupInstruction ([|255; 21; 40; 160; 0; 0|], 2l);
LiteralInstruction [|80|]; 
X86Call 6471l; 
LiteralInstruction [|89|];
LiteralInstruction [|106; 0|];
ImagebaseFixupInstruction ([|104; 240; 22; 0; 0|], 1l);
LiteralInstruction [|106; 0|]; 
LiteralInstruction [|106; 101|];
SetTemp 0l; 
AddRegisterToTemp Esp; 
AddImmediateToTemp 20l;
PushDereferencedTemp;
ImagebaseFixupInstruction ([|255; 21; 72; 161; 0; 0|], 2l);
LiteralInstruction [|51; 192|]; VMExit 16l|]}];

Step 3:  Optimize the code within the basic block.  The goal is to convert sequences of VM instructions into a new language more conducive to being compiled back into X86.  The optimizer is the most powerful component of my compiler:  it can remove obfuscation automatically simply as a side-effect of being an optimizer (not that ReWolf's has any, but others do), and employs no pattern matching.

vertexlist =
[{label = 0;
 instruction =
[|ImagebaseFixupInstruction ([|255; 21; 40; 160; 0; 0|], 2l);
LiteralInstruction [|80|]; 
X86Call 6471l;
LiteralInstruction [|89|];
LiteralInstruction [|106; 0|];
ImagebaseFixupInstruction ([|104; 240; 22; 0; 0|], 1l);
LiteralInstruction [|106; 0|]; 
LiteralInstruction [|106; 101|];
SyntheticInstruction (Push, Plus (Constant 20l, Register Esp));
ImagebaseFixupInstruction ([|255; 21; 72; 161; 0; 0|], 2l);
LiteralInstruction [|51; 192|]; 
VMExit 16l|]}];

Step 4:  Recompile all virtual instructions into x86 machine language.

vertexlist =
[{label = 0;
 instruction =
[|ImagebaseFixupInstruction ([|255; 21; 40; 160; 0; 0|], 2l);
LiteralInstruction [|80|];
RelativeFixupInstruction ([|232; 0; 0; 0; 0|], 6471l, 1l);
LiteralInstruction [|89|]; 
LiteralInstruction [|106; 0|];
ImagebaseFixupInstruction ([|104; 240; 22; 0; 0|], 1l);
LiteralInstruction [|106; 0|]; 
LiteralInstruction [|106; 101|];
LiteralInstruction [|255; 116; 36; 20|];
ImagebaseFixupInstruction ([|255; 21; 72; 161; 0; 0|], 2l);
LiteralInstruction [|51; 192|];
LiteralInstruction [|194; 16; 0|]|]}];

Step 5:  Stuff the original bytes back into the binary and perform fixups specified.  If you can convert between hex and decimal in your head, you'll notice that the bytes above correspond to those below, modulo fixups.  For multi-basic-block functions, this is harder, as you have to sequence the blocks and decide between short and long jumps.

.VM:004131D0 FF 15 28 A0 40 00 call ds:GetTickCount
.VM:004131D6 50                push eax
.VM:004131D7 E8 6B E7 FE FF    call loc_401947
.VM:004131DC 59                pop  ecx
.VM:004131DD 6A 00             push 0
.VM:004131DF 68 F0 16 40 00    push offset loc_4016F0
.VM:004131E4 6A 00             push 0
.VM:004131E6 6A 65             push 65h
.VM:004131E8 FF 74 24 14       push dword ptr [esp+14h]
.VM:004131EC FF 15 48 A1 40 00 call ds:DialogBoxParamA
.VM:004131F2 33 C0             xor eax, eax
.VM:004131F4 C2 10 00          retn 10h

Step 6:  Celebrate.  ReWolf's X86 Virtualizer was simple, and surely breaking the harder ones is, well, harder, but I believe that the general principles espoused here should be applicable to the others.


Here is the source code.

Industrial-Grade Binary-Only Profiling and Coverage

Originally published on February 16th, 2008 on OpenRCE

There are a few options for profiling or performing code-coverage analysis on a per-module binary level:

* Run traces (very slow and generate a huge amount of uninteresting data, but it works);
* MSR tracing (strengths and weaknesses remain to be seen, but seems fairly promising);
* BinNavi/CoverIt/PaiMei/presumably Inspector:  put a breakpoint on every function you found in a static disassembly (doesn't work in general; I explained why here)

There are more options rooted in academia, the most practical of which being dynamic binary instrumentation (DBI), the technology behind tools such as valgrind and DynamoRIO.  The inner workings of this technology are very interesting, but they are rather involved and their precise technical details are beyond the scope of this entry.  Informally speaking, they disassemble a basic block, convert the instructions into an intermediate language like the ones you find inside of a compiler, and finally re-compile the IL with the "instrumentation" code baked directly into the new assembly language.  For more information, read the original Ph.D. thesis describing Valgrind and then read the source to libVEX, a component thereof.  Valgrind is slow and linux-only, but DynamoRIO was specifically designed with speed in mind (hence the "Dynamo") and runs on Windows.

Here I present a DynamoRIO extension for code coverage and profiling.  It works on a function-level (although block-level support could be added easily -- the source weighs in at a measly 70 lines in 2kb, so if you want some other feature, just code it), and it can either be a profiler or a code coverage analyzer.  All it does is instrument the code such that each call instruction, direct or indirect, will write its source and target addresses into a file.  This data can then be used for either profiling or code coverage purposes:  simply discard all of the duplicates for the latter, and use the data as-is for the former.  This is just the back-end, but I imagine that this could be easily integrated into PaiMei's front end to provide an industrial-grade coverage and profiling tool.

Strengths of DynamoRIO:
* speed (you might not even notice the slowdown);
* stability (there used to be a commercial security product based on this technology -- it is literally industrial grade);
* trivial to code extensions for (70 lines, 2kb for this simple yet powerful extension).

Weaknesses:
* definitely won't work with self-modifying code
* probably won't work with obfuscated or "self-protecting" code (there's particularly a problem with so-called "pc-relative" addressing, such as call $ / pop ebp).

Studious readers may note that automatic indirect call resolution is exceptionally useful for C++ reverse engineering;  comment out the direct call resolution, recompile, write a quick IDC script to add the x-refs to the disassembly listing, and you've got a killer C++ RE tool.  Credit goes to spoonm for having and implementing this idea initially.

Note that in the six years since this was published, new binary instrumentation tools such as Intel's PIN have emerged with ameliorate some of the weaknesses of the tools described in this post from 2008.

Code Generation Quirk Involving Array Indexing

Originally published on February 13th, 2008 on OpenRCE.

In this post, we shall investigate some strange-looking code generated in the context of an array index.

.text:10002D49 mov eax, [esp+arg_0]
.text:10002D4D lea ecx, [eax-9C40h]
.text:10002D53 cmp ecx, 50h
.text:10002D56 ja short loc_10002D60
.text:10002D58 mov eax, dword ptr ds:(loc_1000EF5B+1)[eax*8]
.text:10002D5F retn
.text:10002D60
.text:10002D60 loc_10002D60:
.text:10002D60 lea edx, [eax-0A029h]
.text:10002D66 cmp edx, 9
.text:10002D69 ja short loc_10002D73
.text:10002D6B mov eax, dword ptr ds:loc_1000D344[eax*8]
.text:10002D72 retn

We don't find any arrays at the locations referenced on lines -D58 and -D6B (in fact we find code) which is unusual:

; First target
.text:1000EF57 movzx eax, word ptr [esi+18h]
.text:1000EF5B loc_1000EF5B: ; DATA XREF: 10002D58
.text:1000EF5B add dword_10065280, eax
.text:1000EF61 xor eax, eax
.text:1000EF63 pop esi
.text:1000EF64 mov esp, ebp
.text:1000EF66 pop ebp

; Second target
.text:1000D342 mov esp, ebp
.text:1000D344 loc_1000D344: ; DATA XREF: 10002D6B
.text:1000D344 pop ebp

Looking closer at the code, the trick lies in the fact that the arrays are not being indexed starting at zero.

.text:10002D58 mov eax, dword ptr ds:(loc_1000EF5B+1)[eax*8] ; <- 0x9C40 <= eax < 0x9C90
.text:10002D6B mov eax, dword ptr ds:loc_1000D344[eax*8] ; <- 0xA029 <= eax < 0xA032

So the first array actually begins at 0x1000EF5B+1+0x9C40*8 == 0x1005D15C, and the second array begins at 0x1000D344+0x0A029*8 == 0x1005D48C.  What happened here is that the pointer expression has been simplified to conform to x86's instruction encoding:

[1005D15Ch + (eax - 0x9C40) * 8] => [1005D15Ch - 4E200h + eax*8] => [1000EF5Ch + eax*8]

This is pretty uncommon; I've only seen it a handful of times in my reversing endeavors over the years.

Compiler Optimizations Regarding Structures

Originally published on January 22nd, 2008 on OpenRCE

Here are some optimizations that I have seen MSVC apply to structure references.  I wish I could give you the real names for these optimizations, but I can't find them in any of my compilers textbooks.  I have a feeling that they're buried away somewhere inside of Randy Allen and Ken Kennedy's incredibly dense tome, "Optimizing Compilers for Modern Architectures".  If anybody knows the real names for these transformations, please speak up.

#1:  Let's say we are accessing multiple entries in a structure that's larger than 80h.  Now as stated in the previous entry, each access to the members situated at >= 0x80 is going to require a dword in the instruction encoding if we generate the "naive" code.  If we instead do:

lea esi, [esi+middle_of_structure_somewhere]
; ...
mov eax, [esi-(middle_of_structure_somewhere - member_offset1)]
mov ebx, [esi+(member_offset2 - middle_of_structure_somewhere)]

We can access more of the structure with the one-byte instruction encoding, if those subtracted quantities are bytes.  The compiler chooses middle_of_structure_somewhere specifically to maximize the number of one-byte references.  This is the same idea behind the "frame pointer delta" stack-frame optimization.

#2:  Let's say we have a loop that accesses two arrays of structures inside of another structure, one array beginning at +1234h, the other beginning at +2234h.  If we emit the "naive" code:

; ecx = loop induction variable
imul ebx, ecx, sizeof(structure1)
imul edx, ecx, sizeof(structure2)
; ...
mov eax, [esi+1234h+ebx+offset_of_member1]
mov edi, [esi+2234h+edx+offset_of_member2]

Then obviously both of these structure displacements are going to require a separate dword in the instruction encoding for 1234h+offset_of_member1 and 2234h+offset_of_member2.  If we instead do:

lea esi, [esi+1234h]
; ...
; ecx = loop induction variable
imul ebx, ecx, sizeof(structure1)
imul edx, ecx, sizeof(structure2)
; ...
mov eax, [esi+ebx+offset_of_member1]
mov edi, [esi+1000h+edx+offset_of_member2]

Then if offset_of_member1 is a byte, it's only going to require a byte in the instruction encoding, thus saving three bytes per reference to the first structure (we can combine the previous optimization to place esi such that the number of one-byte references is maximized).  Alternatively, if more members in the second structure are accessed than those in the first, we'll see:

lea esi, [esi+2234h]
; ...
; ecx = loop induction variable
imul ebx, ecx, sizeof(structure1)
imul edx, ecx, sizeof(structure2)
; ...
mov eax, [esi+ebx+offset_of_member1-1000h]
mov edi, [esi+edx+offset_of_member2]

Once again, the first optimization can also be applied here to choose the optimal placement for esi that maximizes the number of single-byte references.  The multiplications given in the second optimization can also be optimized away into additions.

Byte Search in Structure Recovery

Originally published on January 21st, 2008 on OpenRCE

Here's an old and simple trick that I use extensively while recovering large structures in C++ reversing.  Briefly, the challenges in structure recovery are to determine:

  1. The size of the structure;
  2. The inheritance hierarchy that relates this structure to others;
  3. The location of the members within the structure;
  4. The data types of the members;
  5. All of the locations in the code where the structure is used;
  6. The holistic picture:  the overall purpose of the structure and the contributions of each data member to that.

This entry is concerned with point #5.  Let's assume that we know (#3) where a particular member within a structure is situated.  In order to figure out (#4) its data type and (#6) its purpose, we should inspect (#5) the locations at which this data member is used.  We might get lucky; maybe we'll find something like this:

mov eax, [esi+Structure.field_XYZ]
push eax
push offset fmtstr ; "%s:Loading into memory for emulation\n"
call LoggingFunction
add esp, 8

From this we can infer both the data type (char *) and functionality (it's a pointer to the name of the file that is about to be emulated), and draw a conclusion about the overall structure (that it's probably related to emulation).  Perhaps we won't get as lucky as this scenario, but maybe a more subtle clue is revealed by one of the references.  So, how do we find other locations at which this structure member is being used?

The obvious answer would be to text-search for the phrase "[reg32+0XYZh]", but this method has a few drawbacks:

A)  It's slow;
B)  It relies upon the disassembler properly distinguishing code from data, which is in fact impossible to solve generally due to equivalence with the halting problem (a result of indirect addressing, which is the bread and butter of C++'s implementation of polymorphism via function pointers);
C)  Finding the string above just tells us that field_XYZ in *some* structure is being used, not necessarily our particular structure of interest.

Point C is critical and bears closer inspection; how can we be sure that the results we are finding actually refer to the structure that interests us?  Let's examine the situation for some specific values of XYZ:

Q:  How many structures contain a member defined at XYZ = +0?
A:  All of them.  Therefore if we were to search for [reg32], we could make no guarantees about which structure is actually being used (or even that a structure is being used, period).

Q:  How many structures contain a member defined at XYZ = +4?
A:  Most of them.  The same comment from above applies.

Q:  How many structures contain a member defined at XYZ = +40?
A:  Few of them.  In my experience a program generally contains proportionally very many structures that have size 0x40 or less, and proportionally very few structures larger than that.

Q:  How many structures contain a member defined at XYZ = +X, where X >= 0x80?  X >= 0x100?  X >= 0x1000?  X >= 0x10000?  X >= 0x80 and X is not dword-boundary-aligned?  X >= 0x80 and X is not a multiple of a high power of two?
A:  The larger the structure, the better the chance that the location of its data members are unique, or if not unique, then at least that the structures were derived from a common base class.

The first lesson is that the higher the offset within the structure, the fewer structures are going to have data members defined at that offset, which means that offset searching begins to become feasible for these high-offset data members.  Point C from above is addressed.

On point B, let's briefly look at some characteristics of instruction encoding on x86.  Below are some typical structure references:

8B 16 mov edx, [esi] ; notice that the +0 is not present in the encoding
66 89 42 36 mov [edx+36h], ax ; notice that the +36h is present as a byte in the encoding
8B 8E AC 5F 00 00 mov ecx, [esi+5FACh] ; notice that the +5FACh is present as a dword in the encoding

 

Displacements off of a register that fit into a single signed byte, e.g. [esi-80h] ... [esi+7Fh], are represented with a single signed byte in the instruction's encoding, e.g. the 36h from the above.  But searching for a byte is no good; a single byte could appear in any context.  Displacements off of a register that are outside of this small window, e.g. [esi+80h], are represented with a dword in the instruction's encoding, e.g. the AC 5F 00 00 from the above.  Therefore, any time one of these high-offset structure members is accessed directly, we're going to see an entire dword in the instruction stream that corresponds to the offset of the structure member.  Searching for an entire dword gives much more precise results than searching for a byte.


Now all of the machinery is in place for the real point of this entry.  Suppose we can't figure out field_5FAC's data type or functionality, and we would like to see other references to that member to see if they provide any clues.  We could text search for the regular expression [.*+5FACh], and we would be reasonably sure that we were finding references to our structure of interest, or at least structures from the same family, but it would be slow, and would only find references that were defined as code.

This is where IDA's "binary search" feature, alt-B or Search->Sequence of bytes..., comes in handy.  Enter AC 5F 00 00 into the window.  IDA instantaneously brings up a window with sixty-nine lines of code, all of which have the form "mov reg32, [reg32+5FACh]" or "mov [reg32+5FACh], reg32".  There is one additional result of the form "db 0ACh", which, when the surrounding bytes are turned into code, is revealed to be a structure reference of the aforementioned variety.  None of the results are false positives.

The point of this blog entry was to say that, the larger a structure becomes, the more "unique" the addresses of the members within the structure become, and due to the instruction encoding on x86, we can find all direct references to the high-offset structure members quickly, easily, and with few to no false positives using IDA's binary search feature.

CommWarrior.B Thorough IDB (ARM/C++)

This was originally posted on January 3rd, 2008 on OpenRCE.

This is the IDB for a nasty little SymbianOS worm that I reverse engineered in February of 2006.  The project was more difficult than most in several respects.  I'd only ever done one ARM project before this, and so I found myself referencing the ARM documentation.  I had no familiarity with the SymbianOS API, which turns out to be object-oriented from start to finish.  Apart from that, the author made extensive use of the object-oriented features of C++ in his non-API-related code; the project was the most intensely object-oriented one that I had done up until that time.  Plus, this excellent document on SymbianOS reversing had not been released yet.  I also did not have access to hardware upon which to run the worm, and so the project had to be conducted purely statically.  Finally, I had never used a mobile phone before and was unfamiliar with all of this fancy SMS and BlueTooth stuff -- yeah, I'm a luddite.

I also did a decompilation for this, but I think that releasing it would do more harm than good.  Mobile phone worms are lame, and the world does not need more of them.

Make sure to check out the database notepad.  Enjoy!

ProcDump 1.62 Thorough IDB

Originally published October 7, 2007 on OpenRCE

After some deliberation, I have decided to release my thorough IDB for ProcDump 1.62 Finalwhich is substantially more detailed than the original ASM source code itself.  If you care to study it, you can learn a great deal about coding dynamic reversing tools and static reversing.  

At the time I analyzed this, in late 2003, it was the largest binary that I'd attempted.  My analysis style was somewhat immature and sporadic, and so you shouldn't try to emulate anything you see inside of it.  (It took another six months after this to perfect my static technique.)

I hope that the ProcDump authors aren't upset about this; after all, ProcDump is nine years old and has since been succeeded by ImpRec, OllyDump, NTICEDUMP, etc.  Greets to the ProcDump team, and thanks for their valuable contribution (which ultimately shaped the direction of dynamic tools for years to come).

IDA's IDS Files

Originally published June 7, 2007 on OpenRCE.

This topic comes up occasionally, so it's worth a quick investigation.  Your IDA directory has a subdirectory called 'ids' that contains more directories, which in turn contain .IDS files.  .IDS files do two things:  they define a mapping between ordinal numbers and symbol names (which may be mangled, and may contain the number of function arguments and their types), and secondly they allow (optional) comments for those functions.

The IDSUtil Package from Hex-Rays' website (only available to customers) provides tools to create .IDT files from statically-linked libraries and then to convert those into .IDS files.  .IDT files are flat text files whose syntax is described in the readme.txt inside of the IDSUTIL package.  

The 'ar2idt' tool produces an .IDT file from a .LIB.  Its command-line syntax is "ar2idt [filename].[lib/obj/o/etc.]" to produce [filename].IDT.  This tool supports several different object-file formats, as different compiler vendors use different ones.

Here's a sample from an .IDT file:

0 Name=MSGS.DLL
1 Name=??0CBaseMtm@@IAE@AAVCRegisteredMtmDll@@AAVCMsvSession@@@Z
2 Name=??0CBaseServerMtm@@IAE@AAVCRegisteredMtmDll@@PAVCMsvServerEntry@@@Z
3 Name=??0CMsgActive@@IAE@H@Z
4 Name=??0CMsvDefaultServices@@QAE@XZ
5 Name=??0CMsvEntrySelection@@QAE@XZ
313 Name=??0CMsvFindOperation@@IAE@AAVCMsvSession@@ABVTDesC16@@IAAVTRequestStatus@@@Z
314 Name=??0CMsvFindResultSelection@@QAE@XZ
6 Name=??0CMsvOperation@@QAE@AAVCMsvSession@@HAAVTRequestStatus@@@Z

After you have an .IDT file, the zipids.exe tool is used to turn an .IDT file into an .IDS file.  Its command-line is simply "zipids [filename].IDT" to create [filename].IDS.

A SymbianOS Example

While reverse engineering a SymbianOS worm in February 2006, I noticed that IDA wouldn't convert some by-ordinal imports from SymbianOS DLLs into their real names:

.idata:00405678 ;
.idata:00405678 ; Imports from PBKENG[101f4cce].DLL
.idata:00405678 ;
.idata:00405678 IMPORT __imp_PBKENG_18; DATA XREF: .text:off_404568
.idata:0040567C IMPORT __imp_PBKENG_21; DATA XREF: .text:off_4045A8
.idata:00405680 IMPORT __imp_PBKENG_43; DATA XREF: .text:off_404518
.idata:00405684 IMPORT __imp_PBKENG_72; DATA XREF: .text:off_404588
.idata:00405688 IMPORT __imp_PBKENG_73; DATA XREF: .text:off_404578
.idata:0040568C IMPORT __imp_PBKENG_101 ; DATA XREF: .text:off_404528
.idata:00405690 IMPORT __imp_PBKENG_110 ; DATA XREF: .text:off_404538
.idata:00405694 IMPORT __imp_PBKENG_173 ; DATA XREF: .text:off_404508
.idata:00405698 IMPORT __imp_PBKENG_180 ; DATA XREF: .text:off_404548
.idata:0040569C IMPORT __imp_PBKENG_185 ; DATA XREF: .text:off_404558
.idata:004056A0 IMPORT __imp_PBKENG_254 ; DATA XREF: .text:off_404598

I installed the SymbianOS SDK and then came up with a convoluted series of scripts wrapped around the GNU tool suite that would extract the function names and their ordinals from the relevant .LIB, and then create an IDC script that would rename any import-by-ordinal to its real name.  A friend chuckled at this Rube Goldberg-esque contraption and suggested that I use the IDSUTIL package instead.

It couldn't be easier:  just type "ar2idt pbkeng.lib && zipids pbkeng.idt" to produce an .IDS file for the pbkeng.lib static library.  Now inside of IDA, go to File->Load File->IDS File, and select the .IDS file that was created.  Alternatively, you can put this in the %IDA%\ids\epoc6\arm directory to have IDA load it automatically (after a restart).  Here are the results of applying it:

.idata:00405678 ;
.idata:00405678 ; Imports from PBKENG[101f4cce].DLL
.idata:00405678 ;
.idata:00405678 ; CPbkContactItem::CardFields(void)const
.idata:00405678 IMPORT CardFields__C15CPbkContactItem
.idata:00405678 ; DATA XREF: .text:off_404568
.idata:0040567C ; CPbkContactEngine::CloseContactL(long)
.idata:0040567C IMPORT CloseContactL__17CPbkContactEnginel
.idata:0040567C ; DATA XREF: .text:off_4045A8
.idata:00405680 ; CPbkContactEngine::CreateContactIteratorLC(int)
.idata:00405680 IMPORT CreateContactIteratorLC__17CPbkContactEnginei
.idata:00405680 ; DATA XREF: .text:off_404518
.idata:00405684 ; CPbkFieldInfo::FieldId(void)const
.idata:00405684 IMPORT FieldId__C13CPbkFieldInfo
.idata:00405684 ; DATA XREF: .text:off_404588

MFC Example

Let's see how to convert the MFC .DEF file into an .IDS file.  First, here's a snippet from the .DEF file:

; This is a part of the Microsoft Foundation Classes C++ library.
; Copyright (C) 1992-1998 Microsoft Corporation
; All rights reserved.

LIBRARY MFC42 

EXPORTS
DllGetClassObject @ 1 PRIVATE
DllCanUnloadNow @ 2 PRIVATE
DllRegisterServer @ 3 PRIVATE
DllUnregisterServer @ 4 PRIVATE
?classCCachedDataPathProperty@CCachedDataPathProperty@@2UCRuntimeClass@@B @ 5 DATA
?classCDataPathProperty@CDataPathProperty@@2UCRuntimeClass@@B @ 6 DATA
; MFC 4.2(final release)
??0_AFX_CHECKLIST_STATE@@QAE@XZ @ 256 NONAME

We can see that lines starting with a ";" are comments, any line containing the string " @ " is an actual export declaration, and everything else is part of the DEF file structure.  We only want the export declarations.  Let's run a quick sed/awk script on the .DEF file:

sed -e '/^ *;/d' MFC42.def | sed -n -e '/ @ /p' | gawk '{ print $3 " Name="$1 }' > MFC42.idt && zipids MFC42.idt

The first part of that command erases any comment-lines (those that begin with any number of spaces and then a semi-colon); the second part accepts any line that contains the string " @ "; and the third part converts the results into the .IDT file format.

To complete the job, we need to manually add a line that says "0 Name=MFC42.dll" to the top of the file.  Also, be sure to name the .IDT file the same as the DLL/LIB base name, e.g. mfc42.idt.  As before, we then run zipids on it to produce an .IDS file, which can be loaded into IDA and/or put into the %IDA%\ids directory to have it loaded automatically when appropriate.

Before applying the .IDS file:

.idata:4BB710DC extrn __imp_MFC42_6467:dword ; DATA XREF: MFC42_6467

Afterwards:

.idata:4BB710DC ; public: __thiscall AFX_MAINTAIN_STATE2::AFX_MAINTAIN_STATE2(class AFX_MODULE_STATE *)
.idata:4BB710DC extrn ??0AFX_MAINTAIN_STATE2@@QAE@PAVAFX_MODULE_STATE@@@Z:dword

Shellcode Analysis

Originally published April 3, 2007 on OpenRCE.

Here is a simple IDA trick that I use for shellcode analysis.  API functions in shellcode are typically looked up dynamically based upon the DLL's base address and a 32-bit hash of the function's name (GetProcAddress via hashing), like such:

seg000:00000268    push    0EC0E4E8Eh ; is actually LoadLibraryA
seg000:0000026D    push    eax
seg000:0000026E    call    sub_29D

The most commonly-seen API hashing function is the following one:

seg000:000002C6 loc_2C6:
seg000:000002C6    lodsb
seg000:000002C7    test    al, al
seg000:000002C9    jz      short loc_2D2
seg000:000002CB    ror     edi, 0Dh
seg000:000002CE    add     edi, eax
seg000:000002D0    jmp     short loc_2C6

Since shellcode often makes use of well-known subsets of the Windows API (such as WinExec, CreateProcess, CreateFile, MapViewOfFile, the sockets API, the wininet API, etc), it is often obvious from the context which API functions are being used.  Nevertheless, occasionally you'll have to reverse a hash into an API name, and that can quickly become annoying.

My solution to this is a small python script, based upon Ero's pefile, that creates an IDC declaration of an IDA enumeration for each DLL.  The enum serves as a mapping between each exported name and its hash.  Since the API hashing function may change, the Python function to do this is extensible via a function pointer which defaults to the standard hash presented above.

After creating the IDC script and loading it into IDA, simply press 'm' with your cursor over the hash value.  IDA will either find the hash in one of the enumerations or tell you that it can't find it, in which case either your hash function implementation is buggy, or the function lies inside of a DLL whose hashed export names are not yet loaded as an enum.  A successful result is:

seg000:00000268 push kernel32_apihashes_LoadLibraryA
seg000:0000026D push eax
seg000:0000026E call sub_29D

One nice thing about this method is that IDA will search all loaded enumerations for the hash value.  I.e. you don't need to tell IDA which DLL base address is being passed as arg_0:  if it knows the hash, it will tell you both the name of the export and the name of the enumeration that it came from.  Using a named enumeration element eliminates the need for a comment at each API call site.  Another nice thing is that, since the hash presented above is so common, you can create the enumerations once and put them into a big 'shellcode.idc' file and then immediately apply them to any shellcode using this hash (such as some recent in-the-wild ANI exploits, or the HyperUnpackMe2 VM from my most recent OpenRCE article) without missing a beat.

Porting everything to IDAPython and thereby removing the dependency upon the IDC script is left as a simple exercise for the reader.