Switch as Binary Search, Part 0

Originally posted on OpenRCE on November 28th, 2008

For a standard switch with a contiguous region of cases, as in the below, compilers generate constant-time lookups by treating the cases as indexes into an array of pointers.

switch(x)
{
  case 10: /* case 10 code */ break;
  // No case 11 (default)
  case 12: /* case 12 code */ break;
  case 13: /* case 13 code */ break;
  case 14: /* case 14 code */ break;
  // No case 15 (default)
  case 16: /* case 16 code */ break;
  default: /* default code */ break;
}

This might compile into the following:

sub eax, 10 
cmp eax, 6 
ja @default 
jmp dword ptr switch_table[eax*4] 

switch_table dd offset loc_case0
             dd offset default
             dd offset loc_case2
             dd offset loc_case3
             dd offset loc_case4
             dd offset default
             dd offset loc_case6
             ... 

For switches with multiple labels pointing to the same code, some compilers save space by generating doubly-indexed, tabular code.

switch(x)
{
  case 0x00..0x20: /* 0x00 <= x < 0x20 code */ break;
  case 0x21..0x40: /* 0x21 <= x < 0x40 code */ break;
}

This might compile to:

cmp eax, 40h 
ja @over 
movzx eax, byte ptr index_table[eax*4] 
jmp dword ptr switch_table[eax*4] ; 
72 bytes instead of 256 
index_table  db 32 DUP(0)
             db 32 DUP(1) 
switch_table dd offset loc_case_00_00
             dd offset loc_case_20_40

For switches with non-contiguous, sparsely-distributed cases like the below, the table-based approaches from above are unsuitable, and yet cases like this do appear in practice, so the compiler must have a strategy for dealing with them.

switch(x)
{
  case 0x0:
  case 0x16:
  case 0x326:
  case 0x9821:
  case 0x90826:
  case 0x278946:
  case 0x4578939:
  case 0x12372826:
  default:
}

One obvious solution to this problem would be to generate a sequence of comparisons, one for each case, starting from the lowest case and ending with the highest.  This would work, but the lookup would be O(N) in the number of case labels.  An equally obvious solution, and one that was mentioned in the book "Reversing", is to generate a binary search construct in the generated code to perform lookups in O(log(N)) time.


To briefly review binary searching, consider the (necessarily) sorted sequence [1;2;3;4;5;6;7;8;9], and consider finding the value 2.  We begin by comparing against the middle element, 5. Since 2 is lesser, we discard all values 5 and above and consider those below it.  Now we have a smaller range of values upon which we can perform the same procedure.

[1;2;3;4]

As before, we compare against the middle element, 3.  Since it is lesser, we can discard anything above 3, leaving [1;2] as the remaining search space.

[1;2]

The middle element is 2, which is our search key, so we stop.  This process took three steps (log(9)) to complete, compared with N=9 steps for searching the whole array.

The compiler is responsible for generating code that implements these comparisons.  The first comparison will be against the middle element in the collection; whether the case value is above or below determines whether to repeat the process for the integers above or below the middle one.  At each comparison there is a corresponding equality test to see whether the search key is the same as the comparator.

Below is an example of this construct in the wild.

AUTO:00462141  cmp  eax, 0E3h 
AUTO:00462146  jnb  loc_46220F ; see below 
AUTO:0046214C  cmp  eax, 73h 
AUTO:0046214F  jnb  loc_463131 
AUTO:00462155  cmp  eax, 3Bh 
AUTO:00462158  jnb  loc_463606 
AUTO:0046215E  cmp  eax, 1Fh 
AUTO:00462161  jnb  loc_463835 
AUTO:00462167  cmp  eax, 10h 
AUTO:0046216A  jnb  loc_463948 
AUTO:00462170  cmp  eax, 8 
AUTO:00462173  jnb  loc_4639D1 
AUTO:00462179  cmp  eax, 4 
AUTO:0046217C  jnb  loc_463A1A 
AUTO:00462182  cmp  eax, 2 
AUTO:00462185  jnb  loc_463A2E  
AUTO:0046220F  ja   loc_462257 
AUTO:00462211  push ebx             ; case 0E3h 
AUTO:00462212  call sub_460920

Hex-Rays does a nice job of illustrating the time-consuming ugly confusingness of dealing with code like this:

if ( v7 <= 0x837FDF02 )
  goto LABEL_755;
if ( v7 >= 0x837FE22C )
{
  if ( v7 <= 0x837FE22C )
    return sub_45EA80(v4);
  if ( v7 < 0x837FE358 )
  {
    if ( 2088770818 != v5 )
      goto LABEL_10;
    goto LABEL_97;
  }
  if ( v7 <= 0x837FE358 )
    goto LABEL_750;
  if ( 2088770608 != v5 )
    goto LABEL_10;
  goto LABEL_100;
}
if ( v7 < 0x837FE0E2 )
  goto LABEL_10;
if ( v7 <= 0x837FE0E2 )
  goto LABEL_535;
if ( 2088771118 == v5 )
{
  sub_460880(v4);
  goto LABEL_23;
}

This entry is continued in part 1.