The Art of Repair: Optimizing Iterative Program Repair with   Instruction-Tuned Models

Fernando Vallecillos Ruiz; Max Hort; Leon Moonen

arXiv:2505.02931·cs.SE·May 7, 2025

The Art of Repair: Optimizing Iterative Program Repair with Instruction-Tuned Models

Fernando Vallecillos Ruiz, Max Hort, Leon Moonen

PDF

Open Access

TL;DR

This paper explores optimizing automatic program repair by combining multiple output generation and iterative refinement using instruction-tuned LLMs, achieving significant improvements with limited fine-tuning data.

Contribution

It introduces a balanced APR pipeline leveraging instruction-tuned LLMs with limited fine-tuning, demonstrating substantial gains and the importance of iterative strategies for complex benchmarks.

Findings

01

Up to 78% increase in plausible patches with minimal fine-tuning data.

02

Iterative patch generation outperforms single-shot approaches, especially on complex benchmarks.

03

Overfitting occurs beyond certain fine-tuning thresholds, reducing benefits.

Abstract

Automatic program repair (APR) aims to reduce the manual efforts required to identify and fix errors in source code. Before the rise of LLM-based agents, a common strategy was to increase the number of generated patches, sometimes to the thousands, to achieve better repair results on benchmarks. More recently, self-iterative capabilities enabled LLMs to refine patches over multiple rounds guided by feedback. However, literature often focuses on many iterations and disregards different numbers of outputs. We investigate an APR pipeline that balances these two approaches, the generation of multiple outputs and multiple rounds of iteration, while imposing a limit of 10 total patches per bug. We apply three SOTA instruction-tuned LLMs - DeepSeekCoder-Instruct, Codellama-Instruct, Llama3.1-Instruct - to the APR task. We further fine-tune each model on an APR dataset with three sizes (1K,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Software Engineering Research · Scientific Computing and Data Management

MethodsBalanced Selection