Practical Type Inference: High-Throughput Recovery of Real-World Structures and Function Signatures
Lukas Seidel, Sam Thomas, Konrad Rieck

TL;DR
This paper introduces XTRIDE, a high-throughput, practical approach for recovering real-world data structures and function signatures from stripped binaries, significantly improving speed and accuracy over previous methods.
Contribution
XTRIDE is an optimized n-gram-based method that achieves comparable performance to state-of-the-art techniques while being 70 to 2300 times faster, enabling practical deployment in automated reverse engineering pipelines.
Findings
Achieves 90.15% overall type inference accuracy.
Outperforms current state-of-the-art on the DIRT dataset by 5.09 percentage points.
Enables effective function signature recovery in embedded firmware.
Abstract
The recovery of types from stripped binaries is a key to exact decompilation, yet its practical realization suffers. For composite structures in particular, both layout and semantic fidelity are required to enable end-to-end reconstruction. Many existing approaches either synthesize layouts or infer names post-hoc, which weakens downstream usability. This is further aggravated by an excessive runtime overhead that is especially prohibitive in automated environments. We present XTRIDE, an improved n-gram-based approach that focuses on practicality: highly optimized throughput and actionable confidence scores allow for deployment in automated pipelines. When compared to the state of the art in struct recovery, our method achieves comparable performance while being between 70 and 2300 times faster. As our inference is grounded in real-world types, we achieve the highest ratio of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Natural Language Processing Techniques · Topic Modeling
