Empirical Evaluation of Large Language Models in Automated Program Repair
Jiajun Sun, Fengjie Li, Xinzhu Qi, Hongyu Zhang, Jiajun Jiang

TL;DR
This study empirically evaluates large language models for automated program repair across multiple languages and scenarios, revealing that model specialization and prompt design significantly influence repair effectiveness.
Contribution
It provides a comprehensive analysis of modern large-scale LLMs in APR, highlighting the impact of model size, specialization, and prompting strategies on repair performance.
Findings
Model specialization can outperform larger general models.
Repair performance does not increase linearly with model size.
Correct patches often appear early in generation.
Abstract
The increasing prevalence of software bugs has made automated program repair (APR) a key research focus. Large language models (LLMs) offer new opportunities for APR, but existing studies mostly rely on smaller, earlier-generation models and Java benchmarks. The repair capabilities of modern, large-scale LLMs across diverse languages and scenarios remain underexplored. To address this, we conduct a comprehensive empirical study of four open-source LLMs, CodeLlama, LLaMA, StarCoder, and DeepSeek-Coder, spanning 7B to 33B parameters, diverse architectures, and purposes. We evaluate them across two bug scenarios (enterprise-grades and algorithmic), three languages (Java, C/C++, Python), and four prompting strategies, analyzing over 600K generated patches on six benchmarks. Key findings include: (1) model specialization (e.g., CodeLlama) can outperform larger general-purpose models (e.g.,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Software System Performance and Reliability · Advanced Data Storage Technologies
