Disassembling Obfuscated Executables with LLM

Huanyao Rong; Yue Duan; Hang Zhang; XiaoFeng Wang; Hongbo Chen,; Shengchen Duan; Shen Wang

arXiv:2407.08924·cs.CR·July 15, 2024

Disassembling Obfuscated Executables with LLM

Huanyao Rong, Yue Duan, Hang Zhang, XiaoFeng Wang, Hongbo Chen,, Shengchen Duan, Shen Wang

PDF

Open Access

TL;DR

DisasLLM introduces an LLM-based approach for disassembling heavily obfuscated executables, significantly improving accuracy over existing methods by understanding binary semantics.

Contribution

The paper presents DisasLLM, a novel LLM-driven disassembler that effectively analyzes obfuscated executables by combining instruction classification and strategic disassembly.

Findings

01

DisasLLM outperforms state-of-the-art disassemblers on heavily obfuscated binaries.

02

The LLM-based classifier accurately identifies correctly decoded instructions.

03

End-to-end disassembly accuracy is significantly improved with DisasLLM.

Abstract

Disassembly is a challenging task, particularly for obfuscated executables containing junk bytes, which is designed to induce disassembly errors. Existing solutions rely on heuristics or leverage machine learning techniques, but only achieve limited successes. Fundamentally, such obfuscation cannot be defeated without in-depth understanding of the binary executable's semantics, which is made possible by the emergence of large language models (LLMs). In this paper, we present DisasLLM, a novel LLM-driven dissembler to overcome the challenge in analyzing obfuscated executables. DisasLLM consists of two components: an LLM-based classifier that determines whether an instruction in an assembly code snippet is correctly decoded, and a disassembly strategy that leverages this model to disassemble obfuscated executables end-to-end. We evaluated DisasLLM on a set of heavily obfuscated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital and Cyber Forensics