Deep Graph-Language Fusion for Structure-Aware Code Generation
Mert Tiftikci, Amir Molzam Sharifloo, Mira Mezini

TL;DR
This paper introduces CGFuse, a framework that integrates graph-based code representations into pre-trained language models at the token level, improving code generation by explicitly capturing code structure.
Contribution
It systematically explores deep, token-level fusion of graph features within PLMs, a novel approach that enhances structural awareness in code generation.
Findings
Up to 16% BLEU improvement in code generation.
Up to 11% CodeBLEU improvement.
Effective integration of graph features at token level.
Abstract
Pre-trained Language Models (PLMs) have the potential to transform software development tasks. However, despite significant advances, current PLMs struggle to capture the structured and relational attributes of code, such as control flow and data dependencies. This limitation is rooted in an architectural mismatch: whereas code structure is best represented by graphs, transformer-based LLMs process input as sequential token patterns and therefore lack explicit structural awareness. While recent research has explored integrating graph-based code representations using techniques like graph feature extraction, retrieval-augmented generation, and prompt engineering, existing approaches suffer from information loss during dense feature extraction or prompt encoding; notably, the potential of deep, token-level fusion of graph features within model internals has not been systematically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
