Detecting Code Clones with Graph Neural Networkand Flow-Augmented Abstract Syntax Tree
Wenhan Wang, Ge Li, Bo Ma, Xin Xia, Zhi Jin

TL;DR
This paper introduces a novel method for detecting semantic code clones by constructing flow-augmented abstract syntax trees and applying graph neural networks, significantly improving accuracy over existing AST-based approaches.
Contribution
The paper is the first to apply graph neural networks on flow-augmented ASTs for code clone detection, leveraging control and data flow information.
Findings
Outperforms state-of-the-art methods on Java datasets
Effective use of control and data flow in clone detection
First application of GNNs in this domain
Abstract
Code clones are semantically similar code fragments pairs that are syntactically similar or different. Detection of code clones can help to reduce the cost of software maintenance and prevent bugs. Numerous approaches of detecting code clones have been proposed previously, but most of them focus on detecting syntactic clones and do not work well on semantic clones with different syntactic features. To detect semantic clones, researchers have tried to adopt deep learning for code clone detection to automatically learn latent semantic features from data. Especially, to leverage grammar information, several approaches used abstract syntax trees (AST) as input and achieved significant progress on code clone benchmarks in various programming languages. However, these AST-based approaches still can not fully leverage the structural information of code fragments, especially semantic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software System Performance and Reliability · Advanced Malware Detection Techniques
