Retrieval-Augmented Code Review Comment Generation

Hyunsun Hong; Jongmoon Baik

arXiv:2506.11591·cs.SE·June 16, 2025

Retrieval-Augmented Code Review Comment Generation

Hyunsun Hong, Jongmoon Baik

PDF

Open Access

TL;DR

This paper introduces a retrieval-augmented generation approach for automated code review comment generation, combining the strengths of generation-based and IR-based methods to improve accuracy and token recovery.

Contribution

It proposes a retrieval-augmented generation method that conditions pretrained language models on retrieved code review examples, enhancing comment generation quality.

Findings

01

Outperforms existing generation-based and IR-based methods in accuracy.

02

Improves low-frequency token generation by up to 24%.

03

Performance increases with more retrieved exemplars.

Abstract

Automated code review comment generation (RCG) aims to assist developers by automatically producing natural language feedback for code changes. Existing approaches are primarily either generation-based, using pretrained language models, or information retrieval-based (IR), reusing comments from similar past examples. While generation-based methods leverage code-specific pretraining on large code-natural language corpora to learn semantic relationships between code and natural language, they often struggle to generate low-frequency but semantically important tokens due to their probabilistic nature. In contrast, IR-based methods excel at recovering such rare tokens by copying from existing examples but lack flexibility in adapting to new code contexts-for example, when input code contains identifiers or structures not found in the retrieval database. To bridge the gap between…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Natural Language Processing Techniques · Topic Modeling