DIRE: A Neural Approach to Decompiled Identifier Naming
Jeremy Lacomis, Pengcheng Yin, Edward J. Schwartz, Miltiadis, Allamanis, Claire Le Goues, Graham Neubig, Bogdan Vasilescu

TL;DR
DIRE is a probabilistic neural approach that improves variable name recovery in decompiled binaries, significantly enhancing code understandability by accurately predicting original variable names using lexical and structural cues.
Contribution
This paper introduces DIRE, a novel neural method for variable name recovery in decompiled code, and provides a large training corpus for evaluation.
Findings
DIRE predicts original variable names with up to 74.3% accuracy.
A new corpus of 164,632 binaries was created for training and evaluation.
The approach leverages both lexical and structural information from decompiled code.
Abstract
The decompiler is one of the most common tools for examining binaries without corresponding source code. It transforms binaries into high-level code, reversing the compilation process. Decompilers can reconstruct much of the information that is lost during the compilation process (e.g., structure and type information). Unfortunately, they do not reconstruct semantically meaningful variable names, which are known to increase code understandability. We propose the Decompiled Identifier Renaming Engine (DIRE), a novel probabilistic technique for variable name recovery that uses both lexical and structural information recovered by the decompiler. We also present a technique for generating corpora suitable for training and evaluating models of decompiled code renaming, which we use to create a corpus of 164,632 unique x86-64 binaries generated from C projects mined from GitHub. Our results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Advanced Malware Detection Techniques · Software System Performance and Reliability
