Disa: Accurate Learning-based Static Disassembly with Attentions
Peicheng Wang, Monika Santra, Mingyu Liu, Cong Sun, Dongrui Zeng, Gang Tan

TL;DR
Disa is a deep learning-based disassembly method that improves accuracy in identifying instruction boundaries and function entry points, especially in obfuscated binaries, by leveraging self-attention mechanisms.
Contribution
Disa introduces a novel self-attention-based learning approach for static disassembly, enhancing boundary detection and CFG accuracy over prior methods.
Findings
Outperforms previous deep-learning disassembly methods in function entry-point detection.
Achieves 9.1% and 13.2% F1-score improvements on obfuscated binaries.
Improves CFG accuracy with 18.5% better memory block precision.
Abstract
For reverse engineering related security domains, such as vulnerability detection, malware analysis, and binary hardening, disassembly is crucial yet challenging. The fundamental challenge of disassembly is to identify instruction and function boundaries. Classic approaches rely on file-format assumptions and architecture-specific heuristics to guess the boundaries, resulting in incomplete and incorrect disassembly, especially when the binary is obfuscated. Recent advancements of disassembly have demonstrated that deep learning can improve both the accuracy and efficiency of disassembly. In this paper, we propose Disa, a new learning-based disassembly approach that uses the information of superset instructions over the multi-head self-attention to learn the instructions' correlations, thus being able to infer function entry-points and instruction boundaries. Disa can further identify…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPhysical Unclonable Functions (PUFs) and Hardware Security · Advanced Malware Detection Techniques · Security and Verification in Computing
