Generating Code with the Help of Retrieved Template Functions and Stack Overflow Answers
Dawn Drain, Changran Hu, Chen Wu, Mikhail Breslav, Neel Sundaresan

TL;DR
This paper introduces a retrieval-augmented code generation framework that enhances sequence-to-sequence models with relevant code snippets and Stack Overflow data, leading to significant improvements in code autocompletion metrics.
Contribution
The authors propose a novel retrieval-guided code generator utilizing multiple retrieval models and datasets, achieving state-of-the-art results on CodeSearchNet and creating a new dataset for code and Stack Overflow alignment.
Findings
4% reduction in cross-entropy loss
15% improvement in edit distance
44% increase in BLEU score
Abstract
We approach the important challenge of code autocompletion as an open-domain task, in which a sequence-to-sequence code generator model is enhanced with the ability to attend to reference code snippets supplied by a semantic code search engine. In this work, we present a novel framework to precisely retrieve template functions as well as intent-snippet pairs and effectively train such a retrieval-guided code generator. To demonstrate the effectiveness of our model designs, we perform extensive experiments with CodeSearchNet which contains template functions and CoNaLa which contains Stack Overflow intent-snippet pairs. We also investigate different retrieval models, including Elasticsearch, DPR, and our fusion representation search model, which currently holds the number one spot on the CodeSearchNet leaderboard. We observe improvements by leveraging multiple database elements and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Topic Modeling · Natural Language Processing Techniques
