Disassembling Obfuscated Executables with LLM
Huanyao Rong, Yue Duan, Hang Zhang, XiaoFeng Wang, Hongbo Chen,, Shengchen Duan, Shen Wang

TL;DR
DisasLLM introduces an LLM-based approach for disassembling heavily obfuscated executables, significantly improving accuracy over existing methods by understanding binary semantics.
Contribution
The paper presents DisasLLM, a novel LLM-driven disassembler that effectively analyzes obfuscated executables by combining instruction classification and strategic disassembly.
Findings
DisasLLM outperforms state-of-the-art disassemblers on heavily obfuscated binaries.
The LLM-based classifier accurately identifies correctly decoded instructions.
End-to-end disassembly accuracy is significantly improved with DisasLLM.
Abstract
Disassembly is a challenging task, particularly for obfuscated executables containing junk bytes, which is designed to induce disassembly errors. Existing solutions rely on heuristics or leverage machine learning techniques, but only achieve limited successes. Fundamentally, such obfuscation cannot be defeated without in-depth understanding of the binary executable's semantics, which is made possible by the emergence of large language models (LLMs). In this paper, we present DisasLLM, a novel LLM-driven dissembler to overcome the challenge in analyzing obfuscated executables. DisasLLM consists of two components: an LLM-based classifier that determines whether an instruction in an assembly code snippet is correctly decoded, and a disassembly strategy that leverages this model to disassemble obfuscated executables end-to-end. We evaluated DisasLLM on a set of heavily obfuscated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital and Cyber Forensics
