When Retriever Meets Generator: A Joint Model for Code Comment Generation
Tien P. T. Le, Anh M. T. Bui, Huy N. D. Pham, Alessio Bucaioni, Phuong T. Nguyen

TL;DR
This paper introduces RAGSum, a joint retrieval and generation model built on CodeT5 that improves automatic code comment generation by tightly integrating retrieval and synthesis, leading to significant performance gains across multiple programming languages.
Contribution
The paper proposes a unified retrieval-generation framework with contrastive pre-training and self-refinement, enhancing comment accuracy and efficiency over existing methods.
Findings
Outperforms baseline models on Java, Python, and C benchmarks.
Achieves higher BLEU, METEOR, and ROUTE-L scores.
Demonstrates the effectiveness of coupling retrieval and generation.
Abstract
Automatically generating concise, informative comments for source code can lighten documentation effort and accelerate program comprehension. Retrieval-augmented approaches first fetch code snippets with existing comments and then synthesize a new comment, yet retrieval and generation are typically optimized in isolation, allowing irrelevant neighbors topropagate noise downstream. To tackle the issue, we propose a novel approach named RAGSum with the aim of both effectiveness and efficiency in recommendations. RAGSum is built on top offuse retrieval and generation using a single CodeT5 backbone. We report preliminary results on a unified retrieval-generation framework built on CodeT5. A contrastive pre-training phase shapes code embeddings for nearest-neighbor search; these weights then seed end-to-end training with a composite loss that (i) rewards accurate top-k retrieval; and (ii)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Advanced Text Analysis Techniques
