EvoCodeBench: A Human-Performance Benchmark for Self-Evolving LLM-Driven Coding Systems
Wentao Zhang, Jianfeng Wang, Liheng Liang, Yilei Zhao, HaiBin Wen, Zhe Zhao

TL;DR
EvoCodeBench is a new benchmark designed to evaluate self-evolving large language model-driven coding systems, measuring their iterative improvement, efficiency, and robustness across languages compared to human programmers.
Contribution
It introduces a comprehensive benchmark that captures inference-time self-evolution, resource costs, and cross-language stability, with direct comparison to human performance.
Findings
Self-evolving systems show measurable efficiency gains over time.
Benchmark enables analysis of cross-language robustness and long-tail language stability.
Human performance provides a valuable reference point for evaluating LLM coding systems.
Abstract
As large language models (LLMs) continue to advance in programming tasks, LLM-driven coding systems have evolved from one-shot code generation into complex systems capable of iterative improvement during inference. However, existing code benchmarks primarily emphasize static correctness and implicitly assume fixed model capability during inference. As a result, they do not capture inference-time self-evolution, such as whether accuracy and efficiency improve as an agent iteratively refines its solutions. They also provide limited accounting of resource costs and rarely calibrate model performance against that of human programmers. Moreover, many benchmarks are dominated by high-resource languages, leaving cross-language robustness and long-tail language stability underexplored. Therefore, we present EvoCodeBench, a benchmark for evaluating self-evolving LLM-driven coding systems across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning in Materials Science
