An Empirical Study of Retrieval-Augmented Code Generation: Challenges and Opportunities
Zezhou Yang, Sirong Chen, Cuiyun Gao, Zhenhao Li, Xing Hu, Kui Liu,, Xin Xia

TL;DR
This study systematically evaluates retrieval-augmented frameworks for code generation, demonstrating their benefits, analyzing different integration methods, and discussing trade-offs between performance gains and computational costs.
Contribution
It provides a comprehensive empirical analysis of retrieval-augmented code generation, comparing multiple models and integration techniques, and offers practical recommendations for improving performance.
Findings
Retrieval-augmented frameworks improve code generation performance.
BM25 and Sequential Integration Fusion are effective and convenient methods.
Sketch Filling Fusion can further enhance model accuracy.
Abstract
Code generation aims to automatically generate code snippets of specific programming language according to natural language descriptions. The continuous advancements in deep learning, particularly pre-trained models, have empowered the code generation task to achieve remarkable performance. One main challenge of pre-trained models for code generation is the semantic gap between natural language requirements and source code. To address the issue, prior studies typically adopt a retrieval-augmented framework for the task, where the similar code snippets collected by a retrieval process can be leveraged to help understand the requirements and provide guidance for the generation process. However, there is a lack of systematic study on the application of this framework for code generation, including the impact of the final generated results and the specific usage of the framework. In this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Web Data Mining and Analysis
