Code Generation Quirk Involving Array Indexing

Originally published on February 13th, 2008 on OpenRCE.

In this post, we shall investigate some strange-looking code generated in the context of an array index.

.text:10002D49 mov eax, [esp+arg_0]
.text:10002D4D lea ecx, [eax-9C40h]
.text:10002D53 cmp ecx, 50h
.text:10002D56 ja short loc_10002D60
.text:10002D58 mov eax, dword ptr ds:(loc_1000EF5B+1)[eax*8]
.text:10002D5F retn
.text:10002D60 loc_10002D60:
.text:10002D60 lea edx, [eax-0A029h]
.text:10002D66 cmp edx, 9
.text:10002D69 ja short loc_10002D73
.text:10002D6B mov eax, dword ptr ds:loc_1000D344[eax*8]
.text:10002D72 retn

We don't find any arrays at the locations referenced on lines -D58 and -D6B (in fact we find code) which is unusual:

; First target
.text:1000EF57 movzx eax, word ptr [esi+18h]
.text:1000EF5B loc_1000EF5B: ; DATA XREF: 10002D58
.text:1000EF5B add dword_10065280, eax
.text:1000EF61 xor eax, eax
.text:1000EF63 pop esi
.text:1000EF64 mov esp, ebp
.text:1000EF66 pop ebp

; Second target
.text:1000D342 mov esp, ebp
.text:1000D344 loc_1000D344: ; DATA XREF: 10002D6B
.text:1000D344 pop ebp

Looking closer at the code, the trick lies in the fact that the arrays are not being indexed starting at zero.

.text:10002D58 mov eax, dword ptr ds:(loc_1000EF5B+1)[eax*8] ; <- 0x9C40 <= eax < 0x9C90
.text:10002D6B mov eax, dword ptr ds:loc_1000D344[eax*8] ; <- 0xA029 <= eax < 0xA032

So the first array actually begins at 0x1000EF5B+1+0x9C40*8 == 0x1005D15C, and the second array begins at 0x1000D344+0x0A029*8 == 0x1005D48C.  What happened here is that the pointer expression has been simplified to conform to x86's instruction encoding:

[1005D15Ch + (eax - 0x9C40) * 8] => [1005D15Ch - 4E200h + eax*8] => [1000EF5Ch + eax*8]

This is pretty uncommon; I've only seen it a handful of times in my reversing endeavors over the years.