Bridging Code Graphs and Large Language Models for Better Code Understanding
Zeqi Chen, Zhaoyang Chu, Yi Gui, Feng Guo, Yao Wan, Chuan Shi

TL;DR
This paper introduces CGBridge, a plug-and-play method that enhances large language models with code graph information via an external trainable module, significantly improving code understanding tasks without modifying the LLM architecture.
Contribution
It proposes a novel external bridge module trained on large-scale code graphs to incorporate structural semantics into LLMs for better code understanding.
Findings
Achieves 16.19% and 9.12% improvements in code summarization accuracy.
Yields 9.84% and 38.87% gains in code translation accuracy.
Over 4x faster inference compared to LoRA-tuned models.
Abstract
Large Language Models (LLMs) have demonstrated remarkable performance in code intelligence tasks such as code generation, summarization, and translation. However, their reliance on linearized token sequences limits their ability to understand the structural semantics of programs. While prior studies have explored graphaugmented prompting and structure-aware pretraining, they either suffer from prompt length constraints or require task-specific architectural changes that are incompatible with large-scale instructionfollowing LLMs. To address these limitations, this paper proposes CGBridge, a novel plug-and-play method that enhances LLMs with Code Graph information through an external, trainable Bridge module. CGBridge first pre-trains a code graph encoder via selfsupervised learning on a large-scale dataset of 270K code graphs to learn structural code semantics. It then trains an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Topic Modeling
