Accelerating Automatic Program Repair with Dual Retrieval-Augmented Fine-Tuning and Patch Generation on Large Language Models

Hanyang Guo; Xiaoheng Xie; Hong-Ning Dai; Peng Di; Yu Zhang; Bishenghui Tao; Zibin Zheng

arXiv:2507.10103·cs.SE·July 15, 2025

Accelerating Automatic Program Repair with Dual Retrieval-Augmented Fine-Tuning and Patch Generation on Large Language Models

Hanyang Guo, Xiaoheng Xie, Hong-Ning Dai, Peng Di, Yu Zhang, Bishenghui Tao, Zibin Zheng

PDF

TL;DR

This paper introduces SelRepair, a novel approach that combines fine-tuned large language models with a dual retrieval-augmented generation module to improve automated program repair efficiency and accuracy, especially for Java code.

Contribution

The paper presents a new APR method integrating a fine-tuned LLM with a dual RAG module, enhancing retrieval relevance and reducing inference time compared to existing approaches.

Findings

01

Achieves 26.29% and 17.64% EM on Java datasets.

02

Reduces inference time by at least 6.42%.

03

Effectively incorporates semantic and structural code features.

Abstract

Automated Program Repair (APR) is essential for ensuring software reliability and quality while enhancing efficiency and reducing developers' workload. Although rule-based and learning-based APR methods have demonstrated their effectiveness, their performance was constrained by the defect type of repair, the quality of training data, and the size of model parameters. Recently, Large Language Models (LLMs) combined with Retrieval-Augmented-Generation (RAG) have been increasingly adopted in APR tasks. However, current code LLMs and RAG designs neither fully address code repair tasks nor consider code-specific features. To overcome these limitations, we propose SelRepair, a novel APR approach with integration of a fine-tuned LLM with a newly-designed dual RAG module. This approach uses a bug-fix pair dataset for fine-tuning and incorporates semantic and syntactic/structural similarity…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.