Polymorphic Type Inference for Machine Code
Matthew Noonan, Alexey Loginov, David Cok

TL;DR
This paper introduces Retypd, a novel type inference algorithm for machine code that recovers high-level type information, including polymorphism and subtyping, aiding reverse engineering and decompilation.
Contribution
The paper develops a new type system and algorithm supporting recursive polymorphic types and subtyping, improving accuracy and capabilities over existing methods.
Findings
Retypd achieves 98% recall in reconstructing pointer const annotations.
Supports inference on weaker program representations than previous algorithms.
Yields more accurate inferred types compared to existing algorithms.
Abstract
For many compiled languages, source-level types are erased very early in the compilation process. As a result, further compiler passes may convert type-safe source into type-unsafe machine code. Type-unsafe idioms in the original source and type-unsafe optimizations mean that type information in a stripped binary is essentially nonexistent. The problem of recovering high-level types by performing type inference over stripped machine code is called type reconstruction, and offers a useful capability in support of reverse engineering and decompilation. In this paper, we motivate and develop a novel type system and algorithm for machine-code type inference. The features of this type system were developed by surveying a wide collection of common source- and machine-code idioms, building a catalog of challenging cases for type reconstruction. We found that these idioms place a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Advanced Malware Detection Techniques
