An ensemble learning approach for software semantic clone detection
Min Fu, Gang Luo, Xi Zheng, Tianyi Zhang, Dongjin Yu, Miryung Kim

TL;DR
This paper presents an ensemble learning method utilizing word embeddings to improve the detection of semantic code clones, addressing limitations of previous syntactic and deep learning approaches.
Contribution
It introduces a novel ensemble approach combining word embeddings and machine learning for semantic clone detection, enhancing accuracy over existing methods.
Findings
Significantly improved precision and recall on BigCloneBench
Outperforms SourcererCC and CDLH in semantic clone detection
Effective in identifying functionally similar code without syntactic similarity
Abstract
Code clone is a serious problem in software and has the potential to software defects, maintenance overhead, and licensing violations. Therefore, clone detection is important for reducing maintenance effort and improving code quality during software evolution. A variety of clone detection techniques have been proposed to identify similar code in software. However, few of them can efficiently detect semantic clones (functionally similar code without any syntactic resemblance). Recently, several deep learning based clone detectors are proposed to detect semantic clones. However, these approaches have high cost in data labelling and model training. In this paper, we propose a novel approach that leverages word embedding and ensemble learning techniques to detect semantic clones. Our evaluation on a commonly used clone benchmark, BigCloneBench, shows that our approach significantly improves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Advanced Malware Detection Techniques · Software Reliability and Analysis Research
