The previous blog entry touched on a form of compile-time protection that complicated static and dynamic analysis. Namely, the protection replaced control-transfer instructions with privileged instructions, followed by data indicating to which location to transfer control. At run-time, attempting to execute those privileged instructions raises an exception. The exception handler, in turn, catches the exceptions, performs the intended transfers, and resumes execution. The resulting disassembly listing is difficult to read in several capacities.
The previous entry attacked the obfuscation to some extent, but we can do more. IDA processor module extensions are a perfect match for this problem. We can essentially trick IDA into thinking that the obfuscated instructions are their unobfuscated originals, so that the static analyst can read the disassembly listing (and use all of IDA's and Hex-Rays' functionality) as though the code had never been obfuscated. The processor module extension route provides a seamless, slipstream implementation that integrates directly into IDA's analysis facilities, thereby performing deobfuscation as the code is disassembled. Fast, cheap, and good: it turns out that you can have all three.
Control Transfer Obfuscation
The previous blog entry described one privileged instruction employed by the protection, and a technique for mitigating its deleterious effects on analysis. In reality, the protection employs three different privileged instructions, enumerated below. The first two use 16-bits worth of data following the privileged instruction to describe the address to which control should be transferred. The third one uses an immediate constant within the instruction as an index into a function pointer table.
- in eax, dx / dw XXYYh => call 0x405000+XXYYh, as discussed previously.
- in al, dx / dw XXYYh => call 0x405000+XXYYh, where the code bytes at the destination are encrypted, and must be decrypted prior to execution.
- in al, XXh => call dword ptr [4011ACh+XX*4].
This form of protection poses an obvious hindrance to reverse engineering and automated analysis: the control-transfer instructions no longer exist, having been replaced by smaller, privileged instructions. As a result, the static analyst must resolve the original branch destinations manually.
Introduction of Bogus Instructions
The technical specifics of the protection mechanisms provide other irritations for static analysis. Upon decoding an instruction, most disassemblers will employ some logic to determine which address(es) should be decoded next. The typical logic is that, in the case of ...:
- A conditional jump or call, both the targeted address and the address following the jump or call should be decoded.
- An unconditional jump, only the targeted address should be decoded.
- A return instruction, no further instructions should be decoded.
- Any other instruction, the address following the present one should be decoded.
Since the privileged instructions fall under category #4 above, the disassembler will assume that the address following the instruction contains code and should be decoded. However, with this protection scheme, the address following the privileged instructions may contain data, and hence decoding such data will produce bogus instructions. x86's variable-length instruction encoding scheme magnifies the effects of this problem. When the data decodes to an instruction that is more than two bytes in length, the disassembler will miss valid instructions that begin after the data.
Altogether, the result is a messy disassembly listing that does not reflect which instructions actually execute at run-time. The following figure illustrates the problems discussed above. On the left, we see the obfuscated disassembly listing, with the obfuscated control transfers and bogus instructions indicated. On the right, I have manually cleaned up the listing, indicating the proper control transfers. Each of the types of privileged instructions already described are represented within the figure. Additionally, we see the occluding effects of the bogus instructions: at address 0040824A on the left, there is a three-byte instruction, causing the disassembler to miss the valid instruction at address 0040824C shown on the right, and producing the bogus instructions at addresses 0040824D and 0040824E on the left.
Encrypted Code Regions
The first variety of control-transfer obfuscation, listed above, merely masks calls to functions within the module. For the second variety, the code to which control is transferred is actually encrypted within the binary. The exception handler is responsible for allocating executable memory, decrypting the code, copying it into the allocated memory, and transferring execution there.
The "encryption" employed is more tedious than interesting. It consists of merely permuting and/or incrementing bytes within 8-byte blocks of the function's code. The permutation is controlled by a key, allowing each block to be permuted individually. Each such encrypted code region is preceded by 32-bits worth of metadata:
- The length of the encrypted code (as a 16-bit integer),
- The key to use for permutation (as a byte),
- An unused byte.
It is a simple matter to replicate the permutation logic and write a function implementing it.
IDA Processor Module Extensions
Despite the hassles it causes IDA and the reverse engineer, the obfuscation employed by this binary is particularly easy to bypass automatically. After familiarizing ourselves with the obfuscation scheme, we as humans can recognize that every time we encounter one of the three privileged instructions in the disassembly listing, we know to which address control is being transferred, and the nature of the transfer. And since IDA's disassembler logic is extensible by plugins, we can write a short piece of code to perform this recognition on our behalf, and automatically alter the listing such that the obfuscation no longer appears.
IDA processor module extensions allow plugin code to take control of the disassembler logic before the ordinary processor module has a chance to do so, in a manner similar to how filter drivers operate. In particular, IDA processor modules are implemented largely via callbacks that the IDA kernel invokes while disassembling a given binary. Processor module extensions can register callbacks that execute before the original processor module's. They can choose to either handle the events presented by the IDA kernel, or pass them on to the original processor module.
IDA processor modules are complex, but for the purpose of deobfuscating this binary, we only need to talk about the callback responsible for decoding instructions (namely, the ana() callback). That function, which is invoked when the kernel needs to decode an instruction:
- Consumes bytes from the instruction stream,
- Decodes the bytes to determine the specifics of the instruction and its operands,
- Sets fields inside of IDA's global cmd structure to represent the instruction.
Fortunately for us, the processor module extension mechanism is available in IDAPython. All we have to do is derive a class from idaapi.IDP_Hooks and hook ana() by implementing the custom_ana() class method. The logic is trivial. We fetch a byte from the address at which the IDA kernel is requesting disassembly. If the byte is...:
- 0xED, this corresponds to the "in eax, dx" instruction, which is used to obfuscate direct call instructions. We consume the word following that byte, determine the call destination, and set up the cmd structure as though the instruction were "call dest_addr".
- 0xE4, this corresponds to the "in al, imm8" instruction, which is used to obfuscate indirect call instructions. We consume the following byte, determine which function pointer is being called, and set up the cmd structure as though the instruction were "call [dest_addr]".
- 0xEC, this corresponds to the "in al, dx" instruction, which is used to obfuscate direct call instructions to encrypted code regions. First, we consume the word following that byte to determine the call destination. Next, we need to decrypt the code regions and patch the database to reflect the decrypted bytes. Some care needs to be taken here so that we do not decrypt the same region twice. We make use of IDA's persistent storage features, called netnodes, to attach a marker to addresses that we've already decrypted. When we encounter this variety of obfuscated instruction, we check to see whether we've decrypted the bytes at the destination address already. If not, we decrypt the region and set the marker for the address. Finally, we set up the cmd structure as though the instruction were "call dest_addr".
The resulting IDAPython processor module extension (password: "malware") is less than 100 lines of code, the majority of which is the logic for creating the proper instructions and decoding encrypted regions. To use the plugin, simply copy the .py file to %IDA%\plugins\.
Though the Python code may look simple, some complexity lurks nearby the setting of the processor module-specific fields cmd.specflags and cmd.Op[N].specval. For x86, many details can be found in the SDK's intel.hpp. Should you find yourself wanting to replicate this method upon another binary, you might run into weird issues with respect to the output disassembly listing. Igor Skochinsky imparted a good debugging tip: find the type of instruction you want to replicate in a "clean", ordinary binary, dump its insn_t/op_t representations, and ensure that your replacements resemble the "clean" instructions. If you encounter bugs (especially related to cross-references or the display of the instruction/operands), they probably stem from deviations in these structures. I have provided Debug.py in the archive linked above, a (trivial) script implementing Igor's suggestion that I used for debugging.