SEED: Semantic Graph based Deep detection for type-4 clone
Zhipeng Xue, Zhijie Jiang, Chenlin Huang, Rulin Xu, Xiangbing Huang,, Liumin Hu

TL;DR
SEED introduces a semantic graph-based deep learning method for detecting Type-4 code clones, effectively capturing semantic similarities despite syntactic differences, and significantly outperforms existing approaches.
Contribution
The paper presents a novel semantic graph construction and graph matching approach for Type-4 clone detection, improving semantic representation over prior syntactic methods.
Findings
SEED achieves an average of 25.2% higher F1-Score than baseline methods.
SEED outperforms baselines on multiple datasets, demonstrating robustness.
Semantic graph focus on operators and API calls enhances detection accuracy.
Abstract
Type-4 clones refer to a pair of code snippets with similar semantics but written in different syntax, which challenges the existing code clone detection techniques. Previous studies, however, highly rely on syntactic structures and textual tokens, which cannot precisely represent the semantic information of code and might introduce non-negligible noise into the detection models. To overcome these limitations, we design a novel semantic graph-based deep detection approach, called SEED. For a pair of code snippets, SEED constructs a semantic graph of each code snippet based on intermediate representation to represent the code semantic more precisely compared to the representations based on lexical and syntactic analysis. To accommodate the characteristics of Type-4 clones, a semantic graph is constructed focusing on the operators and API calls instead of all tokens. Then, SEED generates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
