ReCopilot: Reverse Engineering Copilot in Binary Analysis
Guoqiang Chen, Huiqi Sun, Daguang Liu, Zhiqi Wang, Qiang Wang, Bin Yin, Lu Liu, Lingyun Ying

TL;DR
ReCopilot is a domain-specific large language model tailored for binary analysis, utilizing specialized training and context techniques to outperform existing tools in tasks like function recovery and type inference.
Contribution
It introduces ReCopilot, a novel LLM for binary analysis, trained with a comprehensive dataset and techniques to enhance reasoning and context understanding.
Findings
ReCopilot achieves state-of-the-art results in binary analysis benchmarks.
It outperforms existing tools and general LLMs by 13%.
Domain-specific training improves binary analysis performance.
Abstract
Binary analysis plays a pivotal role in security domains such as malware detection and vulnerability discovery, yet it remains labor-intensive and heavily reliant on expert knowledge. General-purpose large language models (LLMs) perform well in programming analysis on source code, while binaryspecific LLMs are underexplored. In this work, we present ReCopilot, an expert LLM designed for binary analysis tasks. ReCopilot integrates binary code knowledge through a meticulously constructed dataset, encompassing continue pretraining (CPT), supervised fine-tuning (SFT), and direct preference optimization (DPO) stages. It leverages variable data flow and call graph to enhance context awareness and employs test-time scaling to improve reasoning capabilities. Evaluations on a comprehensive binary analysis benchmark demonstrate that ReCopilot achieves state-of-the-art performance in tasks such as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Adversarial Robustness in Machine Learning · Software Engineering Research
