TL;DR
This paper introduces CppPerf, a pipeline and dataset for benchmarking performance improvements in C++ commits, addressing the lack of realistic, executable C++ performance benchmarks.
Contribution
It presents CppPerf-Mine, a pipeline that mines real-world C++ performance patches from GitHub, and creates CppPerf-DB, a benchmark dataset for evaluating repair tools.
Findings
OpenHands correctly fixes only 13.5% of patches in CppPerf-DB.
CppPerf-DB contains 347 verified patches from 42 repositories.
39% of patches involve multiple files, enabling repository-level evaluation.
Abstract
Recent progress in automated repair of performance bugs demands realistic, executable benchmarks. However, existing C++ performance benchmarks are largely built from competitive programming submissions, and recent real-world benchmarks predominantly target Python and .NET. To fill this gap, we present CppPerf-Mine, a configurable pipeline that mines execution-time-improving patches from open-source C++ repositories on GitHub by combining structural commit filtering, an LLM-based commit classifier, and a containerized build & test stage that produces fully reproducible Docker images for each patch. Using CppPerf-Mine, we build CppPerf-DB, a benchmark comprising 347 manually verified patches from 42 mature C++ repositories, 39% of which are multi-file, enabling the evaluation of repository-level repair tools. In our preliminary study, OpenHands correctly fixes only 13.5% of the patches in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
