SEED: Semantic Graph based Deep detection for type-4 clone

Zhipeng Xue; Zhijie Jiang; Chenlin Huang; Rulin Xu; Xiangbing Huang,; Liumin Hu

arXiv:2109.12079·cs.SE·June 29, 2022

SEED: Semantic Graph based Deep detection for type-4 clone

Zhipeng Xue, Zhijie Jiang, Chenlin Huang, Rulin Xu, Xiangbing Huang,, Liumin Hu

PDF

TL;DR

SEED introduces a semantic graph-based deep learning method for detecting Type-4 code clones, effectively capturing semantic similarities despite syntactic differences, and significantly outperforms existing approaches.

Contribution

The paper presents a novel semantic graph construction and graph matching approach for Type-4 clone detection, improving semantic representation over prior syntactic methods.

Findings

01

SEED achieves an average of 25.2% higher F1-Score than baseline methods.

02

SEED outperforms baselines on multiple datasets, demonstrating robustness.

03

Semantic graph focus on operators and API calls enhances detection accuracy.

Abstract

Type-4 clones refer to a pair of code snippets with similar semantics but written in different syntax, which challenges the existing code clone detection techniques. Previous studies, however, highly rely on syntactic structures and textual tokens, which cannot precisely represent the semantic information of code and might introduce non-negligible noise into the detection models. To overcome these limitations, we design a novel semantic graph-based deep detection approach, called SEED. For a pair of code snippets, SEED constructs a semantic graph of each code snippet based on intermediate representation to represent the code semantic more precisely compared to the representations based on lexical and syntactic analysis. To accommodate the characteristics of Type-4 clones, a semantic graph is constructed focusing on the operators and API calls instead of all tokens. Then, SEED generates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.