- Basic automata/formal languages/Turing machines; Siper's "Introduction to the Theory of Computation" is recommended here.
- Basic programming language theory; I used University of Washington CSE P505 online video lectures and materials and can recommend it.
- Formal semantics; Semantics with Applications is good.
- Compilers. You'll need several resources for this; my personal favorites for an introductory text are Appel's ML book or Programming Language Pragmatics, and Muchnick's "Advanced Compiler Design and Implementation" is mandatory for an advanced understanding. All of the graph theory that you need for this type of work should be covered in books such as these.
- Algorithms. I used several books; for a beginner's treatment I recommend Dasgupta, Papadimitriou, and Vazirani's "Algorithms" ; for an intermediate treatment I recommend MIT's 6.046J on Open CourseWare (which uses the famous Algorithms textbook) ; for an advanced treatment, I liked Algorithmics for Hard Problems .
Mathematics: I was advantaged in that I did my undergraduate degree in the subject. Here's what I can recommend, given five years' worth of hindsight studying program analysis:
- You run into abstract algebra a lot in program analysis as well as in cryptography, so it's best to begin with a solid foundation along those lines. There's a lot of debate as to what the best text is. If you're never touched the subject before, Gallian is very approachable, if not as deep and rigorous as something like Dummit and Foote .
- Order theory is everywhere in program analysis. Introduction to Lattices and Order is the standard (read at least the first two chapters; the more you read, the better), but I recently picked up Lattices and Ordered Algebraic Structures and am enjoying it.
- Complexity theory. Arora and Barak is recommended.
- Formal logic is also everywhere. For this, I recommend the first few chapters in The Calculus of Computation (this is an excellent book; read the whole thing).
- Computability, undecidability, etc. Not entirely separate from previous entries, but read something that treats e.g. Goedel's theorems, for instance The Undecidable .
- Decision procedures. Read Decision Procedures.
- For an extensive treatment of SAT solvers and their applications, read Handbook of Satisfiability. For something math-ier, try Introduction to the Mathematics of Satisfiability.
Program analysis, the "accessible" variety.
- Read the BitBlaze publications starting from the beginning, followed by the BAP publications . Start with these two: TaintCheck and All You Ever Wanted to Know About Dynamic Taint Analysis and Forward Symbolic Execution. (BitBlaze and BAP are available in source code form, too -- in OCaml though, so you'll want to learn that as well: try Real World OCaml.) David Brumley's Ph.D. thesis is an excellent read, as is David Molnar's and Sean Heelan's Master's thesis . After that, look through the archives of the reverse engineering reddit for papers on the "more applied" side of things. You may also want to read the source code to dynamic taint analysis and dynamic binary instrumentation tools: Flayer, TEMU, Argos, Valgrind, DynamoRIO.
Program analysis, the "serious" variety.
- Principles of Program Analysis is an excellent book, but you'll find it very difficult even if you understand all of the above. Similarly, Cousot's MIT lecture course is great but largely unapproachable to the beginner. I highly recommend Value-Range Analysis of C Programs , which is a rare and thorough glimpse into the development of an extremely sophisticated static analyzer. Although this book is heavily mathematical, it's substantially less insane than Principles of Program Analysis. I also found Gogul Balakrishnan's Ph.D. thesis , Johannes Kinder's Ph.D. thesis , Mila Dalla Preda's Ph.D. thesis, Antoine Mine's Ph.D. thesis, and Davidson Rodrigo Boccardo's Ph.D. thesis useful.
- If you've gotten to this point, you'll probably begin to develop a very selective taste for program analysis literature: in particular, if it does not have a lot of mathematics (actual math, not just simple concepts formalized), you might decide that it is unlikely to contain a lasting and valuable contribution. At this point, read papers from CAV, SAS, and VMCAI. Some of my favorite researchers are the Z3 team, Mila Dalla Preda, Joerg Brauer, Andy King, Axel Simon, Roberto Giacobazzi, and Patrick Cousot.
- Although I've tried to lay out a reasonable course of study hereinbefore regarding the mathematics you need to understand this kind of material, around this point in the course you'll find that the creature we're dealing with here is an octopus whose tentacles spread in every direction. In particular, you can expect to encounter topology, category theory, tropical geometry, numerical mathematics, and many other disciplines. Program analysis is multi-disciplinary and has a hard time keeping itself shoehorned in one or two corners of mathematics.
- After several years of wading through program analysis, you start to understand that there must be some connection between theorem-prover based methods and abstract interpretation, since after all, they both can be applied statically and can potentially produce similar information. But what is the connection? Recent publications by Vijay D'Silva et al (1, 2, 3, 4, 5) and a few others (1 2 3 4) have begun to plough this territory.
Final bit of advice: you'll notice that I heavily stuck to textbooks and Ph.D. theses in the above list. I find that jumping straight into the research literature without a foundational grounding is perhaps the most ill-advised mistake one can make intellectually. To whatever extent that what you're interested in is systematized -- that is, covered in a textbook or thesis already, you should read it before digging into the research literature. Otherwise, you'll be the proverbial blind man with the elephant, groping around in the dark, getting bits and pieces of the picture without understanding how it all forms a cohesive whole. I made that mistake and it cost me a lot of time; don't do the same.