PerfCodeBench: Benchmarking LLMs for System-Level High-Performance Code Optimization

Huihao Jing; Wenbin Hu; Haochen Shi; Hanyu Yang; Sirui Zhang; Shaojin Chen; Haoran Li; Yangqiu Song

arXiv:2605.15222·cs.SE·May 18, 2026

PerfCodeBench: Benchmarking LLMs for System-Level High-Performance Code Optimization

Huihao Jing, Wenbin Hu, Haochen Shi, Hanyu Yang, Sirui Zhang, Shaojin Chen, Haoran Li, Yangqiu Song

PDF

1 Repo

TL;DR

PerfCodeBench is a new benchmark for evaluating large language models on high-performance, systems-level code optimization tasks, highlighting the gap between generated code and expert-optimized solutions.

Contribution

Introduces PerfCodeBench, an executable benchmark for assessing LLMs on system-level optimization, including correctness and efficiency metrics, with comprehensive evaluation results.

Findings

01

Significant gap between LLM-generated code and expert-optimized implementations.

02

Models struggle with parallelism and GPU-related tasks.

03

Current models are weak in cross-language robustness and efficiency.

Abstract

Large language models (LLMs) can often generate functionally correct code, but their ability to produce efficient implementations for performance-critical systems tasks remains limited. Existing code benchmarks mainly emphasize correctness or algorithmic problem solving, while realistic systems-level optimization is still underexplored. To address this gap, we introduce PerfCodeBench, an executable benchmark for evaluating LLMs on high-performance code optimization. The tasks require system-level implementation choices, hardware-aware optimization, and careful handling of performance bottlenecks. Each task includes executable correctness checks, a baseline implementation, and a reference optimized solution. This allows us to evaluate both correctness and runtime-oriented efficiency. Our evaluation on a broad set of state-of-the-art LLMs shows a clear gap between model-generated code and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://anonymous.4open.science/r/perfcodebench-7CDE
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.