Retrieval-Augmented Generation for Code Summarization via Hybrid GNN
Shangqing Liu, Yu Chen, Xiaofei Xie, Jingkai Siow, Yang Liu

TL;DR
This paper introduces a retrieval-augmented generation approach combined with a hybrid GNN and attention-based dynamic graph to improve code summarization, achieving state-of-the-art results on a large-scale C code dataset.
Contribution
It proposes a novel retrieval-augmented mechanism and a hybrid GNN with dynamic graph attention to enhance code summarization performance.
Findings
Achieves state-of-the-art BLEU-4, ROUGE-L, and METEOR scores.
Introduces a new large-scale C code benchmark dataset.
Demonstrates significant improvements over existing methods.
Abstract
Source code summarization aims to generate natural language summaries from structured code snippets for better understanding code functionalities. However, automatic code summarization is challenging due to the complexity of the source code and the language gap between the source code and natural language summaries. Most previous approaches either rely on retrieval-based (which can take advantage of similar examples seen from the retrieval database, but have low generalization performance) or generation-based methods (which have better generalization performance, but cannot take advantage of similar examples). This paper proposes a novel retrieval-augmented mechanism to combine the benefits of both worlds. Furthermore, to mitigate the limitation of Graph Neural Networks (GNNs) on capturing global graph structure information of source code, we propose a novel attention-based dynamic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Web Data Mining and Analysis
