Decaf: Improving Neural Decompilation with Automatic Feedback and Search
Alexander Shypula, Osbert Bastani, Edward Schwartz

TL;DR
Decaf enhances neural decompilation accuracy by integrating automatic feedback and search, significantly increasing semantic correctness without losing source code similarity.
Contribution
The paper introduces Decaf, a novel system that uses compiler feedback and search to improve neural decompilation, outperforming previous methods.
Findings
Decompilation rate improved from 26.0% to 83.9%.
Automatic feedback significantly boosts neural decompiler performance.
Method is effective for weaker neural models.
Abstract
Decompilers are useful tools used in reverse engineering to understand compiled source code. Reconstructing source code from compiled binaries is a challenging task, because high-level syntax, identifiers, and custom data types are generally lost as the compiler translates human-readable code to low-level machine code. Deterministic decompilers are useful tools for binary analysis, but can struggle to infer idiomatic syntax and identifier names. Generative AI models are a natural fit for reconstructing high-level syntax, identifiers, and types, but they can still suffer by hallucinating improper programming constructs and semantics. Instead of attempting to improve neural decompilers with more data and more training, we argue that compiler feedback can be used to dramatically improve the semantic correctness of neural decompiler outputs via search. Our system, Decaf (DECompilation with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
