BinMetric: A Comprehensive Binary Analysis Benchmark for Large Language Models
Xiuwei Shang, Guoqiang Chen, Shaoyin Cheng, Benlong Wu, Li Hu, Gangyang Li, Weiming Zhang, Nenghai Yu

TL;DR
BinMetric is a new benchmark for evaluating large language models on binary analysis tasks, addressing a key gap in standardized assessment tools and revealing current strengths and limitations of LLMs in this domain.
Contribution
The paper introduces BinMetric, a comprehensive benchmark with 1,000 questions across 6 binary analysis tasks, enabling standardized evaluation of LLMs in binary analysis.
Findings
LLMs show strong potential in binary analysis tasks
Challenges remain in binary lifting and assembly synthesis
Benchmark establishes a new leaderboard for LLMs in binary analysis
Abstract
Binary analysis remains pivotal in software security, offering insights into compiled programs without source code access. As large language models (LLMs) continue to excel in diverse language understanding and generation tasks, their potential in decoding complex binary data structures becomes evident. However, the lack of standardized benchmarks in this domain limits the assessment and comparison of LLM's capabilities in binary analysis and hinders the progress of research and practical applications. To bridge this gap, we introduce BinMetric, a comprehensive benchmark designed specifically to evaluate the performance of large language models on binary analysis tasks. BinMetric comprises 1,000 questions derived from 20 real-world open-source projects across 6 practical binary analysis tasks, including decompilation, code summarization, assembly instruction generation, etc., which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Security and Verification in Computing
