Preference-Guided Refactored Tuning for Retrieval Augmented Code Generation
Xinyu Gao, Yun Xiong, Deze Wang, Zhenhan Guan, Zejian Shi, Haofen, Wang, Shanshan Li

TL;DR
This paper introduces RRG, a framework that refactors retrieved code to reduce redundancy and align preferences, significantly improving retrieval-augmented code generation efficiency and accuracy.
Contribution
It proposes a novel refactorer module that enhances retrieval-augmented code generation by bridging the preference gap and reducing redundant information.
Findings
RRG improves code generation performance by up to 28% on EM.
Refactoring reduces input length and noise.
Effectively bridges the preference gap between retriever and generator.
Abstract
Retrieval-augmented code generation utilizes Large Language Models as the generator and significantly expands their code generation capabilities by providing relevant code, documentation, and more via the retriever. The current approach suffers from two primary limitations: 1) information redundancy. The indiscriminate inclusion of redundant information can result in resource wastage and may misguide generators, affecting their effectiveness and efficiency. 2) preference gap. Due to different optimization objectives, the retriever strives to procure code with higher ground truth similarity, yet this effort does not substantially benefit the generator. The retriever and the generator may prefer different golden code, and this gap in preference results in a suboptimal design. Additionally, differences in parameterization knowledge acquired during pre-training result in varying preferences…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Web Data Mining and Analysis · Model-Driven Software Engineering Techniques
