REBENCH: A Procedural, Fair-by-Construction Benchmark for LLMs on Stripped-Binary Types and Names (Extended Version)

Jun Yeon Won; Xin Jin; Shiqing Ma; Zhiqiang Lin

arXiv:2604.27319·cs.CR·May 1, 2026

REBENCH: A Procedural, Fair-by-Construction Benchmark for LLMs on Stripped-Binary Types and Names (Extended Version)

Jun Yeon Won, Xin Jin, Shiqing Ma, Zhiqiang Lin

PDF

TL;DR

REBench is a comprehensive, standardized benchmark dataset designed to evaluate large language models in binary reverse engineering tasks, addressing inconsistencies in existing datasets and evaluation methods.

Contribution

The paper introduces REBench, a unified, knowledge-base-driven benchmark dataset that enables fair and consistent evaluation of LLMs on binary reverse engineering tasks.

Findings

01

LLMs show significant difficulty in complex reverse engineering tasks

02

REBench consolidates diverse datasets into a single, comprehensive benchmark

03

The methodology preserves task difficulty and applicability across architectures

Abstract

Large Language Models (LLMs) have achieved remarkable progress in recent years, driving their adoption across a wide range of domains, including computer security. In reverse engineering, LLMs are increasingly applied to critical tasks such as function and variable name recovery and type inference. However, despite the rapid growth of research in this area, progress has been hindered by the absence of a standardized dataset. Existing studies rely on disparate datasets, preprocessing pipelines, and evaluation metrics, making fair comparisons between approaches difficult and obscuring a clear understanding of LLM capabilities in binary analysis. To address these challenges, we present REBench, a comprehensive benchmark dataset for evaluating LLMs on binary reverse engineering tasks. REBench consolidates a superset of existing datasets, comprising hundreds of millions of lines of source…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.