Originally published on November 28th, 2008 on OpenRCE

(Continued from part 0)

We have seen how a compiler-generated binary search partitions the range of integers into successively smaller intervals and ultimately converging upon either a single value or one contiguous range of values. We will now develop a tool to deal with this construct in a friendly fashion for the working reverse engineer.

The tool has the following high-level specification:

Given the first address of a compiled binary switch construct:

- Determine the register used and whether the comparisons are signed (trivial)
- Determine all terminal labels, at which cases reside, and all values that lead to these cases.

A discussion of the problem and its solution follows.

Upon entering the switch, the register value is entirely unconstrained: the range of values that it might have are contained in the interval [0, MAX_INT] (or [-(MAX_INT+1)/2, MAX_INT/2] for signed integers). The following example uses an unsigned integer.

Suppose that the first comparison is of the form

@loc0: cmp eax, val1 jbe @loc1 @loc2:

If the jump is taken, the value must be below or equal to 'val1'. In other words, it lies in the range [0, val1]. If the jump is not taken, the value must be above 'val1', so it lies in the range [val1+1, MAX_INT]. Therefore, each comparison in the tree partitions the range of possible values into two disjoint ranges: one for those values where the jump is taken, and one for those where it is not.

At the second level of the tree, there will be two comparisons: one for the case where the jump for the first was not taken, and one for the case where it was. Consider the second set of such instructions:

@loc1: cmp eax, val2 jbe @loc3 @loc4: @loc2: cmp eax, val3 jbe @loc5 @loc6:

Each of these cases further constrains the input:

- The range of values leading to loc3 is [0, val2]
- The range of values leading to loc4 is [val2+1, val1]
- The range of values leading to loc5 is [val1+1, val3]
- The range of values leading to loc6 is [val3+1, MAX_INT]

This table has a very regular structure, and it should not be too hard to imagine what it would look like for three levels into the tree, or four, or... The following image summarizes the process. The unique path to a given vertex specifies the constraints required for input to reach it.

As we walk the comparisons in the tree in this fashion, the binary tree will eventually stop and give way to the code for the cases in the switch. At this point, our constraints will be terminal cases of either single values (most commonly) or of simple ranges. For each terminal address, we maintain a dictionary associating it with the corresponding ranges that lead there.

The following pseudocode codifies the discussion above.

```
// The first range returned is that for which the jump is taken;
// the second is for non-jump-taking values
partition(ea, low, high, compared)
{
if(comparison at ea is ">")
return [compared+1, high], [low, compared]
if(comparison at ea is ">=")
return [compared, high], [low, compared+1]
if(comparison at ea is "<")
return [low, compared-1], [compared, high]
if(comparison at ea is "<=")
return [low, compared], [compared+1, high]
if(comparison at ea is "==")
return [compared, compared], [low, high]
if(comparison at ea is "!=")
return [low, high], [compared, compared]
}
```

The following recursive algorithm solves the general problem.

```
analyze_bswitch(ea_t ea, int low, int high, int compared)
{
if(ea is cmp reg, constant)
compared = instruction's immediate value
ea = address of next instruction
if(ea is a conditional jump)
[low1,high2], [low2,high2] = partition(ea, low, high, compared)
analyze_bswitch(jump taken ea, low1, high1, compared)
analyze_bswitch(jump not taken ea, low2, high2, compared)
// Instruction is a leaf in the binary tree
else
associate(ea, low, high)
}
```

IDAPython-1.0 compatible source code is available here.

The resulting disassembly is properly annotated with case labels, as IDA does normally:

AUTO:0046348E jbe loc_463D16 ; case 0A0h AUTO:00463494 cmp eax, 0A2h AUTO:00463499 jb loc_462FA0 ; case 0837F81BAh, 0837F81D8h, 0A1h, 091h AUTO:0046349F ja loc_463D21 ; case 0837F90BAh, 0A3h AUTO:004634A5 loc_4634A5: ; CODE XREF: sub_462120:loc_462EDAj AUTO:004634A5 ; sub_462120+DDDj ... AUTO:004634A5 push ebx ; case 0837F8106h, 0837F8124h, 0A2h, 031h AUTO:004634A6 call sub_45C630 AUTO:004634AB jmp loc_462217