Here are the slides and (soon!) code for my recent presentation at Ekoparty 2016. The full abstract can be found below. In brief, this research involved automatically generating shellcodes when there are restrictions on legal encodings. We explore examples ranging from well-known ones (no NULL bytes, no '%' character, printable, alphanumeric, all letters uppercase, etc.) to weird and challenging ones (bytes must alternate even and odd, no duplicate bytes, all words are prime numbers, etc). We also explore automated encoding and decoder generation: e.g., given some existing shellcode, transform it into (for example) alphanumeric bytes, and generate a decoder to revert the encoding at run-time. We also explore tasks like finding the shortest or longest solutions, re-writing existing shellcodes into a given encoding, exploiting known facts about the input state, and integration with automated exploit generation.
SLIDES, CODE, AND INFORMATION
Slides with presenter notes: HERE
Video: HERE (does not include the material on decoder loop synthesis due to time constraints)
Information about the training classes mentioned in the presentation: HERE
The problems of shellcode generation and of memory corruption exploit development share a birthday. In brief, memory corruption exploits must trick a program into executing machine code ("shellcode") provided as input. Each individual exploit scenario may place constraints upon the allowable machine code bytes: NULL bytes (or any arbitrary bytes) may be disallowed; the input may be constrained to be alphanumeric; all ASCII characters may be required to be uppercase; certain characters may be filtered; the input may be transformed in arbitrary ways; the input may be required to lie within UTF-8 or UTF-16; and so on.
Historically, the security community has dealt with these problems on a case-by-case basis. Many papers were written regarding various processor architectures and some common encoding restriction schema. Generally, these publications describe patterns for performing common operations (setting registers to constants, obtaining the program counter, etc.) within the given encoding restriction. From these publications came shellcode encoding; rather than writing the entire shellcode within the encoding restriction, we encode a shellcode blob within the encoding, and generate a machine code program within that encoding to decode the blob and execute it. Shellcode encoders are useful, but they suffer from a number of issues. They expand the size of the shellcode blob, which can render an exploit unworkable. They often contain common sequences of machine code, for which IDS detections are readily available. They are not guaranteed to find an encoding and a decoder, even if one exists. In short, shellcode generation is still a real-world problem, despite the existence of shellcode encoders.
In this publication, we provide a novel technique based on program synthesis for creating machine code programs that fall within a given encoding. Essentially, Synesthesia is a compiler whose inputs are a specification of the desired functionality, together with a specification of the allowable encodings. Synesthesia enjoys a number of nice theoretical properties: it is guaranteed to find an encoding for the desired functionality if one exists; the user can search for the shortest program in terms of byte length or number of instructions; it does not rely upon pattern databases of any sort, so each run can potentially produce an entirely unique output; and it can produce self-modifying code. The ideas behind Synesthesia are not tied to any specific processor architecture, and do not require emulation, access to such a processor, or brute-forcing.
This presentation will discuss the context wherein Synesthesia exists, the concepts behind its design, case studies on more than one assembly language, a performance evaluation, and a discussion of the theoretical limitations (i.e., permanent issues) and practical ones (i.e., limitations of contemporary SMT solvers). Synesthesia shall be made available as open source.
Open the video on half of your screen and the slides on the other, switching through the slides as I do so during the video. The presentation uses a lot of in-frame animations, so for best results, you will want to view the PDF in contiguous rather than contiguous mode. I.e., only one slide should be on-screen at a time (no fractions of subsequent slides visible), and that advancing the slide should bring up an entirely new slide. This is easy to accomplish with the full-page mode of standalone PDF viewers. In Chrome's PDF viewer, you can use the left and right arrow keys to advance or retreat one slide at a time. Alternatively, there is an icon bar at the bottom right of each slide. The first two buttons from the left retreat and advance by one slide, respectively. Failing all of these options, use a different PDF viewer.