Semantics-Recovering Decompilation through Neural Machine Translation
Ruigang Liang, Ying Cao, Peiwei Hu, Jinwen He, Kai Chen

TL;DR
This paper introduces SEAM, a neural machine translation-based decompiler that effectively translates low-level code into high-level, semantically rich, and human-readable code, outperforming existing models in accuracy.
Contribution
The paper presents a novel neural decompilation method that improves semantic recovery and code readability, reducing reliance on manual rule-based approaches.
Findings
SEAM achieves 94.41% accuracy in decompilation tasks.
Semantic information recovery accuracy is 92.64%, comparable to state-of-the-art compilers.
SEAM outperforms prior neural machine translation models in accuracy.
Abstract
Decompilation transforms low-level program languages (PL) (e.g., binary code) into high-level PLs (e.g., C/C++). It has been widely used when analysts perform security analysis on software (systems) whose source code is unavailable, such as vulnerability search and malware analysis. However, current decompilation tools usually need lots of experts' efforts, even for years, to generate the rules for decompilation, which also requires long-term maintenance as the syntax of high-level PL or low-level PL changes. Also, an ideal decompiler should concisely generate high-level PL with similar functionality to the source low-level PL and semantic information (e.g., meaningful variable names), just like human-written code. Unfortunately, existing manually-defined rule-based decompilation techniques only functionally restore the low-level PL to a similar high-level PL and are still powerless to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Software Engineering Research · Software Testing and Debugging Techniques
