Detours into Arcana: A Static Malware Analysis Trick

Several friends asked for my advice on a particular technique seen in some recent malware samples, which is similar in nature to the Nanomites feature of the Armadillo copy protection.  There are three parts to this technique:

  1. When creating the malware binary, replace control transfer (call and jmp) instructions with privileged instructions.
  2. At run-time, install an exception handler.
  3. Catch the exceptions thrown by the privileged instructions, perform the intended control transfers, and resume execution.

A concrete example follows.

UPX0:004081F7 in eax, dx
UPX0:004081F8 dw 4EE0h

When the process attempts to execute this instruction, an exception is generated.  The exception handler checks to see that the faulting instruction is "in eax, dx", then reads the word following the instruction, and generates a call to location 0x405000 + 0x4EE0.  In other words, call instructions within the module are replaced by:

in eax, dx
dw call_destination - 0x405000

As malware analysts, we would like to deobfuscate the binary by replacing these privileged sequences with the original call instructions, both for our own sake as well as the ability to use the Hex-Rays decompiler (which otherwise balks at the use of privileged instructions).  However, the particular implementation within this sample poses a slight conundrum.  The sequence "in eax, dx / dw XXXXh" is three bytes long, whereas the original "call 409EE0h" is five bytes.  Therefore, we cannot merely rewrite the original instruction atop the privileged one without overwriting subsequent instructions.

A second idea is to use detouring, another staple technique in reverse engineering.  We could find or create some unused space within the binary, patch a jmp from the privileged instruction to that new location, recreate the original instruction at that location, and then patch a jmp back to the location after the privileged instruction.  However, this idea is flawed for the same reason:  a long jmp instruction is five bytes long, so we would also alter subsequent instructions.

Bill Gates questionably said "I choose a lazy person to do a hard job, because a lazy person will find an easy way to do it."  After some thought, I recalled a bit of x86 processor arcana that can help us fit our detours into the three-byte spaces provided by the obfuscator:  the address-size prefix, 0x67.  Quoth the Intel manuals:  "Address calculations are first truncated to the effective address size of the current mode, as overridden by any address-size prefix.  The result is then zero-extended to the full address width."  I.e., when operating in 32-bit mode, if we prefix an instruction that references an address with 0x67, the address will be truncated to 16-bits.  

To be specific, consider the following:

UPX1:00410320 EB 00 jmp near ptr unk_410322 
; jump to 0x00410320 + 2(length of jmp instruction) + 0x00 

When we place an address-size prefix on this instruction, we get:

UPX1:00410320          db 67h
UPX1:00410320 67 EB 00 jmp near ptr unk_323 
; jump to (0x00410320 + 3(length of jmp instruction) + 0x00) & 0xFFFF

To use this trick for deobfuscation, we must first create a segment at address 0x0 of length 0xFFFF.

Recalling our motivating example:

UPX0:004081F7 in eax, dx ; obfuscated call 0x405000 + 0x4EE0
UPX0:004081F8 dw 4EE0h 

Let us overwrite these bytes with 67 EB FD:

UPX0:004081F7 db 67h
UPX0:004081F7 jmp short loc_81F7

This is the first half of our detour.  Now, at location 81F7h, let's replicate the original control transfer instruction and add a jmp back to the location after the obfuscated sequence:

ROLFMSRE:000081F7 loc_81F7:
ROLFMSRE:000081F7 call    sub_409EE0
ROLFMSRE:000081FC jmp     loc_4081FA

And now we have accomplished our goal.  Once we have replaced the obfuscated control transfers, we human analysts can read the listing more easily, and Hex-Rays no longer has trouble decompiling the code.

Shellcode Analysis

Originally published April 3, 2007 on OpenRCE.

Here is a simple IDA trick that I use for shellcode analysis.  API functions in shellcode are typically looked up dynamically based upon the DLL's base address and a 32-bit hash of the function's name (GetProcAddress via hashing), like such:

seg000:00000268    push    0EC0E4E8Eh ; is actually LoadLibraryA
seg000:0000026D    push    eax
seg000:0000026E    call    sub_29D

The most commonly-seen API hashing function is the following one:

seg000:000002C6 loc_2C6:
seg000:000002C6    lodsb
seg000:000002C7    test    al, al
seg000:000002C9    jz      short loc_2D2
seg000:000002CB    ror     edi, 0Dh
seg000:000002CE    add     edi, eax
seg000:000002D0    jmp     short loc_2C6

Since shellcode often makes use of well-known subsets of the Windows API (such as WinExec, CreateProcess, CreateFile, MapViewOfFile, the sockets API, the wininet API, etc), it is often obvious from the context which API functions are being used.  Nevertheless, occasionally you'll have to reverse a hash into an API name, and that can quickly become annoying.

My solution to this is a small python script, based upon Ero's pefile, that creates an IDC declaration of an IDA enumeration for each DLL.  The enum serves as a mapping between each exported name and its hash.  Since the API hashing function may change, the Python function to do this is extensible via a function pointer which defaults to the standard hash presented above.

After creating the IDC script and loading it into IDA, simply press 'm' with your cursor over the hash value.  IDA will either find the hash in one of the enumerations or tell you that it can't find it, in which case either your hash function implementation is buggy, or the function lies inside of a DLL whose hashed export names are not yet loaded as an enum.  A successful result is:

seg000:00000268 push kernel32_apihashes_LoadLibraryA
seg000:0000026D push eax
seg000:0000026E call sub_29D

One nice thing about this method is that IDA will search all loaded enumerations for the hash value.  I.e. you don't need to tell IDA which DLL base address is being passed as arg_0:  if it knows the hash, it will tell you both the name of the export and the name of the enumeration that it came from.  Using a named enumeration element eliminates the need for a comment at each API call site.  Another nice thing is that, since the hash presented above is so common, you can create the enumerations once and put them into a big 'shellcode.idc' file and then immediately apply them to any shellcode using this hash (such as some recent in-the-wild ANI exploits, or the HyperUnpackMe2 VM from my most recent OpenRCE article) without missing a beat.

Porting everything to IDAPython and thereby removing the dependency upon the IDC script is left as a simple exercise for the reader.