ComBench: A Repo-level Real-world Benchmark for Compilation Error Repair
Jia Li, Zeyang Zhuang, Zhuangbin Chen, Yuxin Su, Wei Meng, and Michael R. Lyu

TL;DR
ComBench is a novel, repository-level benchmark for real-world C/C++ compilation error repair, enabling more accurate evaluation of AI models' effectiveness in practical software development scenarios.
Contribution
It introduces a systematic, automated framework to create a high-quality, reproducible benchmark from GitHub projects, addressing limitations of existing single-file datasets.
Findings
GPT-5 achieves 73% syntactic success but only 41% semantic correctness.
Different models show distinct strengths for various error types.
ComBench enables realistic evaluation of AI-based compilation error repair methods.
Abstract
Compilation errors pose pervasive and critical challenges in software development, significantly hindering productivity. Therefore, Automated Compilation Error Repair (ACER) techniques are proposed to mitigate these issues. Despite recent advancements in ACER, its real-world performance remains poorly evaluated. This can be largely attributed to the limitations of existing benchmarks, \ie decontextualized single-file data, lack of authentic source diversity, and biased local task modeling that ignores crucial repository-level complexities. To bridge this critical gap, we propose ComBench, the first repository-level, reproducible real-world benchmark for C/C++ compilation error repair. ComBench is constructed through a novel, automated framework that systematically mines real-world failures from the GitHub CI histories of large-scale open-source projects. Our framework contributes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
