CodeRAG-Bench: Can Retrieval Augment Code Generation?

Zora Zhiruo Wang; Akari Asai; Xinyan Velocity Yu; Frank F. Xu; Yiqing; Xie; Graham Neubig; Daniel Fried

arXiv:2406.14497·cs.SE·February 28, 2025·2 cites

CodeRAG-Bench: Can Retrieval Augment Code Generation?

Zora Zhiruo Wang, Akari Asai, Xinyan Velocity Yu, Frank F. Xu, Yiqing, Xie, Graham Neubig, Daniel Fried

PDF

Open Access 1 Repo 1 Datasets 1 Video

TL;DR

This paper introduces CodeRAG-Bench, a comprehensive benchmark to evaluate retrieval-augmented code generation, revealing that while retrieval improves code quality, challenges remain in context retrieval and integration.

Contribution

It systematically assesses retrieval-augmented code generation across diverse tasks and sources, highlighting current limitations and providing a benchmark for future research.

Findings

01

Retrieval improves code generation quality in many scenarios.

02

Retrievers struggle with low lexical overlap and limited context.

03

Generators do not always benefit from additional retrieved contexts.

Abstract

While language models (LMs) have proven remarkably adept at generating code, many programs are challenging for LMs to generate using their parametric knowledge alone. Providing external contexts such as library documentation can facilitate generating accurate and functional code. Despite the success of retrieval-augmented generation (RAG) in various text-oriented tasks, its potential for improving code generation remains under-explored. In this work, we conduct a systematic, large-scale analysis by asking: in what scenarios can retrieval benefit code generation models? and what challenges remain? We first curate a comprehensive evaluation benchmark, CodeRAG-Bench, encompassing three categories of code generation tasks, including basic programming, open-domain, and repository-level problems. We aggregate documents from five sources for models to retrieve contexts: competition solutions,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

code-rag-bench/code-rag-bench
noneOfficial

Datasets

mteb/CodeRAGStackoverflowPosts
dataset· 640 dl
640 dl

Videos

CodeRAG-Bench: Can Retrieval Augment Code Generation?· underline

Taxonomy

TopicsNatural Language Processing Techniques · Handwritten Text Recognition Techniques · Algorithms and Data Compression

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · WordPiece · Residual Connection · Weight Decay · Softmax · Layer Normalization · Byte Pair Encoding · Attention Dropout · Linear Warmup With Linear Decay