AST-Enhanced or AST-Overloaded? The Surprising Impact of Hybrid Graph Representations on Code Clone Detection
Zixian Zhang, Takfarinas Saber

TL;DR
This study empirically evaluates how hybrid graph representations like CFG, DFG, and FA-AST affect GNN-based code clone detection, revealing that certain enrichments improve accuracy while others may hinder performance, with GMN showing strong results even with basic ASTs.
Contribution
It provides a comprehensive empirical comparison of hybrid AST-based graph representations in GNNs for code clone detection, highlighting the effectiveness of different structures and models.
Findings
Hybrid representations like CFG and DFG improve GNN accuracy.
FA-AST can introduce complexity that harms performance.
GMN outperforms other models with standard ASTs.
Abstract
As one of the most detrimental code smells, code clones significantly increase software maintenance costs and heighten vulnerability risks, making their detection a critical challenge in software engineering. Abstract Syntax Trees (ASTs) dominate deep learning-based code clone detection due to their precise syntactic structure representation, but they inherently lack semantic depth. Recent studies address this by enriching AST-based representations with semantic graphs, such as Control Flow Graphs (CFGs) and Data Flow Graphs (DFGs). However, the effectiveness of various enriched AST-based representations and their compatibility with different graph-based machine learning techniques remains an open question, warranting further investigation to unlock their full potential in addressing the complexities of code clone detection. In this paper, we present a comprehensive empirical study to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Testing and Debugging Techniques · Software System Performance and Reliability · Software Engineering Research
MethodsGraph Neural Network
