Detecting Code Clones with Graph Neural Networkand Flow-Augmented   Abstract Syntax Tree

Wenhan Wang; Ge Li; Bo Ma; Xin Xia; Zhi Jin

arXiv:2002.08653·cs.SE·February 21, 2020·33 cites

Detecting Code Clones with Graph Neural Networkand Flow-Augmented Abstract Syntax Tree

Wenhan Wang, Ge Li, Bo Ma, Xin Xia, Zhi Jin

PDF

Open Access 1 Repo 1 Models 1 Datasets

TL;DR

This paper introduces a novel method for detecting semantic code clones by constructing flow-augmented abstract syntax trees and applying graph neural networks, significantly improving accuracy over existing AST-based approaches.

Contribution

The paper is the first to apply graph neural networks on flow-augmented ASTs for code clone detection, leveraging control and data flow information.

Findings

01

Outperforms state-of-the-art methods on Java datasets

02

Effective use of control and data flow in clone detection

03

First application of GNNs in this domain

Abstract

Code clones are semantically similar code fragments pairs that are syntactically similar or different. Detection of code clones can help to reduce the cost of software maintenance and prevent bugs. Numerous approaches of detecting code clones have been proposed previously, but most of them focus on detecting syntactic clones and do not work well on semantic clones with different syntactic features. To detect semantic clones, researchers have tried to adopt deep learning for code clone detection to automatically learn latent semantic features from data. Especially, to leverage grammar information, several approaches used abstract syntax trees (AST) as input and achieved significant progress on code clone benchmarks in various programming languages. However, these AST-based approaches still can not fully leverage the structural information of code fragments, especially semantic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jacobwwh/graphmatch_clone
pytorchOfficial

Models

🤗
dorkai/codeX-1.0
model· ♡ 8
♡ 8

Datasets

semeru/Code-Code-CloneDetection-BigCloneBench
dataset· 45 dl
45 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software System Performance and Reliability · Advanced Malware Detection Techniques