CppPerf: An Automated Pipeline and Dataset for Performance-Improving C++ Commits

Tommy Ho; Khashayar Etemadi; Zhendong Su

arXiv:2605.10890·cs.SE·May 12, 2026

CppPerf: An Automated Pipeline and Dataset for Performance-Improving C++ Commits

Tommy Ho, Khashayar Etemadi, Zhendong Su

PDF

1 Repo

TL;DR

This paper introduces CppPerf, a pipeline and dataset for benchmarking performance improvements in C++ commits, addressing the lack of realistic, executable C++ performance benchmarks.

Contribution

It presents CppPerf-Mine, a pipeline that mines real-world C++ performance patches from GitHub, and creates CppPerf-DB, a benchmark dataset for evaluating repair tools.

Findings

01

OpenHands correctly fixes only 13.5% of patches in CppPerf-DB.

02

CppPerf-DB contains 347 verified patches from 42 repositories.

03

39% of patches involve multiple files, enabling repository-level evaluation.

Abstract

Recent progress in automated repair of performance bugs demands realistic, executable benchmarks. However, existing C++ performance benchmarks are largely built from competitive programming submissions, and recent real-world benchmarks predominantly target Python and .NET. To fill this gap, we present CppPerf-Mine, a configurable pipeline that mines execution-time-improving patches from open-source C++ repositories on GitHub by combining structural commit filtering, an LLM-based commit classifier, and a containerized build & test stage that produces fully reproducible Docker images for each patch. Using CppPerf-Mine, we build CppPerf-DB, a benchmark comprising 347 manually verified patches from 42 mature C++ repositories, 39% of which are multi-file, enabling the evaluation of repository-level repair tools. In our preliminary study, OpenHands correctly fixes only 13.5% of the patches in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://doi.org/10.5281/zenodo.20097425
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.