REx86: A Local Large Language Model for Assisting in x86 Assembly Reverse Engineering
Darrin Lea, James Ghawaly, Golden Richard III, Aisha Ali-Gombe, Andrew Case

TL;DR
REx86 is a fine-tuned local large language model that significantly improves x86 assembly reverse engineering by providing more accurate comments and understanding, addressing privacy concerns associated with cloud-based models.
Contribution
This work introduces REx86, a state-of-the-art local LLM fine-tuned for x86 RE, demonstrating improved performance over base models and highlighting the importance of domain-specific data.
Findings
REx86 reduces cross-entropy loss by 64.2%.
REx86 improves semantic similarity by 20.3%.
Increases correct-solve rate from 31% to 53%.
Abstract
Reverse engineering (RE) of x86 binaries is indispensable for malware and firmware analysis, but remains slow due to stripped metadata and adversarial obfuscation. Large Language Models (LLMs) offer potential for improving RE efficiency through automated comprehension and commenting, but cloud-hosted, closed-weight models pose privacy and security risks and cannot be used in closed-network facilities. We evaluate parameter-efficient fine-tuned local LLMs for assisting with x86 RE tasks in these settings. Eight open-weight models across the CodeLlama, Qwen2.5-Coder, and CodeGemma series are fine-tuned on a custom curated dataset of 5,981 x86 assembly examples. We evaluate them quantitatively and identify the fine-tuned Qwen2.5-Coder-7B as the top performer, which we name REx86. REx86 reduces test-set cross-entropy loss by 64.2% and improves semantic cosine similarity against ground…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
