Idioms: Neural Decompilation With Joint Code and Type Definition Prediction
Luke Dramko, Claire Le Goues, Edward J. Schwartz

TL;DR
This paper presents Idioms, a neural decompiler that jointly predicts code and user-defined types, significantly improving decompilation accuracy on realistic benchmarks and enabling better reverse engineering of compiled software.
Contribution
The work introduces Realtype, a challenging new dataset, and a novel neural decompilation method that jointly predicts code and types, surpassing existing models in accuracy.
Findings
Achieves 54.4% accuracy on ExeBench, outperforming prior models.
Performs at least 95% better on the Realtype dataset.
State-of-the-art results in neural decompilation accuracy.
Abstract
Decompilers are important tools for reverse engineers that help them analyze software at a higher level of abstraction than assembly code. Unfortunately, because compilation is lossy, deterministic decompilers produce code that is missing many of the details that make source code readable in the first place, like variable names and types. Neural decompilers, on the other hand, offer the ability to statistically fill in these details. Existing work in neural decompilation, however, suffers from substantial limitations that preclude its use on real code, such as the inability to define composite types, which is essential to fully specify function semantics. In this work, we introduce a new dataset, Realtype, that includes substantially more complicated and realistic types than existing neural decompilation benchmarks, and Idioms, a new neural decompilation approach to finetune any LLM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
