The Art of Repair: Optimizing Iterative Program Repair with Instruction-Tuned Models
Fernando Vallecillos Ruiz, Max Hort, Leon Moonen

TL;DR
This paper explores optimizing automatic program repair by combining multiple output generation and iterative refinement using instruction-tuned LLMs, achieving significant improvements with limited fine-tuning data.
Contribution
It introduces a balanced APR pipeline leveraging instruction-tuned LLMs with limited fine-tuning, demonstrating substantial gains and the importance of iterative strategies for complex benchmarks.
Findings
Up to 78% increase in plausible patches with minimal fine-tuning data.
Iterative patch generation outperforms single-shot approaches, especially on complex benchmarks.
Overfitting occurs beyond certain fine-tuning thresholds, reducing benefits.
Abstract
Automatic program repair (APR) aims to reduce the manual efforts required to identify and fix errors in source code. Before the rise of LLM-based agents, a common strategy was to increase the number of generated patches, sometimes to the thousands, to achieve better repair results on benchmarks. More recently, self-iterative capabilities enabled LLMs to refine patches over multiple rounds guided by feedback. However, literature often focuses on many iterations and disregards different numbers of outputs. We investigate an APR pipeline that balances these two approaches, the generation of multiple outputs and multiple rounds of iteration, while imposing a limit of 10 total patches per bug. We apply three SOTA instruction-tuned LLMs - DeepSeekCoder-Instruct, Codellama-Instruct, Llama3.1-Instruct - to the APR task. We further fine-tune each model on an APR dataset with three sizes (1K,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Software Engineering Research · Scientific Computing and Data Management
MethodsBalanced Selection
