When More Retrieval Hurts: Retrieval-Augmented Code Review Generation

Qianru Meng; Xiao Zhang; Zhaochen Ren; Joost Visser

arXiv:2511.05302·cs.SE·March 26, 2026

When More Retrieval Hurts: Retrieval-Augmented Code Review Generation

Qianru Meng, Xiao Zhang, Zhaochen Ren, Joost Visser

PDF

Open Access

TL;DR

This paper introduces RARe, a retrieval-augmented framework for code review generation that leverages relevant historical reviews to improve output quality, but finds that excessive retrieval can negatively impact performance.

Contribution

The paper proposes RARe, a novel retrieval-augmented approach for code review generation that effectively incorporates historical reviews as in-context examples for large language models.

Findings

01

RARe outperforms strong baselines on public benchmarks.

02

Using only the top-1 retrieved example yields the best results.

03

More retrieval examples can degrade performance due to redundancy and conflicting cues.

Abstract

Code review generation can reduce developer effort by producing concise, reviewer-style feedback for a given code snippet or code change. However, generation-only models often produce generic or off-point reviews, while retrieval-only methods struggle to adapt well to new contexts. In this paper, we view retrieval augmentation for code review as retrieval-augmented in-context learning, where retrieved historical reviews are placed in the input context as examples that guide the model's output. Based on this view, we propose RARe (Retrieval-Augmented Code Reviewer), a framework that retrieves relevant historical reviews from a corpus and conditions a large language model on the retrieved in-context examples. Experiments on two public benchmarks show that RARe outperforms strong baselines and reaches BLEU-4 scores of 12.32 and 12.96. A key finding is that more retrieval can hurt: using…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Topic Modeling