Disa: Accurate Learning-based Static Disassembly with Attentions

Peicheng Wang; Monika Santra; Mingyu Liu; Cong Sun; Dongrui Zeng; Gang Tan

arXiv:2507.07246·cs.CR·July 11, 2025

Disa: Accurate Learning-based Static Disassembly with Attentions

Peicheng Wang, Monika Santra, Mingyu Liu, Cong Sun, Dongrui Zeng, Gang Tan

PDF

Open Access

TL;DR

Disa is a deep learning-based disassembly method that improves accuracy in identifying instruction boundaries and function entry points, especially in obfuscated binaries, by leveraging self-attention mechanisms.

Contribution

Disa introduces a novel self-attention-based learning approach for static disassembly, enhancing boundary detection and CFG accuracy over prior methods.

Findings

01

Outperforms previous deep-learning disassembly methods in function entry-point detection.

02

Achieves 9.1% and 13.2% F1-score improvements on obfuscated binaries.

03

Improves CFG accuracy with 18.5% better memory block precision.

Abstract

For reverse engineering related security domains, such as vulnerability detection, malware analysis, and binary hardening, disassembly is crucial yet challenging. The fundamental challenge of disassembly is to identify instruction and function boundaries. Classic approaches rely on file-format assumptions and architecture-specific heuristics to guess the boundaries, resulting in incomplete and incorrect disassembly, especially when the binary is obfuscated. Recent advancements of disassembly have demonstrated that deep learning can improve both the accuracy and efficiency of disassembly. In this paper, we propose Disa, a new learning-based disassembly approach that uses the information of superset instructions over the multi-head self-attention to learn the instructions' correlations, thus being able to infer function entry-points and instruction boundaries. Disa can further identify…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPhysical Unclonable Functions (PUFs) and Hardware Security · Advanced Malware Detection Techniques · Security and Verification in Computing