There are a few options for profiling or performing code-coverage analysis on a per-module binary level:
* Run traces (very slow and generate a huge amount of uninteresting data, but it works);
* MSR tracing (strengths and weaknesses remain to be seen, but seems fairly promising);
* BinNavi/CoverIt/PaiMei/presumably Inspector: put a breakpoint on every function you found in a static disassembly (doesn't work in general; I explained why here)
There are more options rooted in academia, the most practical of which being dynamic binary instrumentation (DBI), the technology behind tools such as valgrind and DynamoRIO. The inner workings of this technology are very interesting, but they are rather involved and their precise technical details are beyond the scope of this entry. Informally speaking, they disassemble a basic block, convert the instructions into an intermediate language like the ones you find inside of a compiler, and finally re-compile the IL with the "instrumentation" code baked directly into the new assembly language. For more information, read the original Ph.D. thesis describing Valgrind and then read the source to libVEX, a component thereof. Valgrind is slow and linux-only, but DynamoRIO was specifically designed with speed in mind (hence the "Dynamo") and runs on Windows.
Here I present a DynamoRIO extension for code coverage and profiling. It works on a function-level (although block-level support could be added easily -- the source weighs in at a measly 70 lines in 2kb, so if you want some other feature, just code it), and it can either be a profiler or a code coverage analyzer. All it does is instrument the code such that each call instruction, direct or indirect, will write its source and target addresses into a file. This data can then be used for either profiling or code coverage purposes: simply discard all of the duplicates for the latter, and use the data as-is for the former. This is just the back-end, but I imagine that this could be easily integrated into PaiMei's front end to provide an industrial-grade coverage and profiling tool.
Strengths of DynamoRIO:
* speed (you might not even notice the slowdown);
* stability (there used to be a commercial security product based on this technology -- it is literally industrial grade);
* trivial to code extensions for (70 lines, 2kb for this simple yet powerful extension).
* definitely won't work with self-modifying code
* probably won't work with obfuscated or "self-protecting" code (there's particularly a problem with so-called "pc-relative" addressing, such as call $ / pop ebp).
Studious readers may note that automatic indirect call resolution is exceptionally useful for C++ reverse engineering; comment out the direct call resolution, recompile, write a quick IDC script to add the x-refs to the disassembly listing, and you've got a killer C++ RE tool. Credit goes to spoonm for having and implementing this idea initially.
Note that in the six years since this was published, new binary instrumentation tools such as Intel's PIN have emerged with ameliorate some of the weaknesses of the tools described in this post from 2008.