Preference-Guided Refactored Tuning for Retrieval Augmented Code   Generation

Xinyu Gao; Yun Xiong; Deze Wang; Zhenhan Guan; Zejian Shi; Haofen; Wang; Shanshan Li

arXiv:2409.15895·cs.SE·September 25, 2024

Preference-Guided Refactored Tuning for Retrieval Augmented Code Generation

Xinyu Gao, Yun Xiong, Deze Wang, Zhenhan Guan, Zejian Shi, Haofen, Wang, Shanshan Li

PDF

Open Access

TL;DR

This paper introduces RRG, a framework that refactors retrieved code to reduce redundancy and align preferences, significantly improving retrieval-augmented code generation efficiency and accuracy.

Contribution

It proposes a novel refactorer module that enhances retrieval-augmented code generation by bridging the preference gap and reducing redundant information.

Findings

01

RRG improves code generation performance by up to 28% on EM.

02

Refactoring reduces input length and noise.

03

Effectively bridges the preference gap between retriever and generator.

Abstract

Retrieval-augmented code generation utilizes Large Language Models as the generator and significantly expands their code generation capabilities by providing relevant code, documentation, and more via the retriever. The current approach suffers from two primary limitations: 1) information redundancy. The indiscriminate inclusion of redundant information can result in resource wastage and may misguide generators, affecting their effectiveness and efficiency. 2) preference gap. Due to different optimization objectives, the retriever strives to procure code with higher ground truth similarity, yet this effort does not substantially benefit the generator. The retriever and the generator may prefer different golden code, and this gap in preference results in a suboptimal design. Additionally, differences in parameterization knowledge acquired during pre-training result in varying preferences…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Web Data Mining and Analysis · Model-Driven Software Engineering Techniques