A systematic literature review on source code similarity measurement and clone detection: techniques, applications, and challenges
Morteza Zakeri-Nasrabadi, Saeed Parsa, Mohammad Ramezani and, Chanchal Roy, Masoud Ekhtiarzadeh

TL;DR
This systematic review analyzes over 10000 studies on source code similarity measurement and clone detection, highlighting techniques, tools, datasets, applications, and key challenges in the field.
Contribution
It provides a comprehensive classification and analysis of existing approaches, datasets, and tools, identifying gaps and future research directions in code similarity measurement.
Findings
Nearly 80 tools using eight techniques across five domains.
49% of tools support Java, 37% support C/C++, with limited language coverage.
Only 8 out of 12 datasets are publicly accessible.
Abstract
Measuring and evaluating source code similarity is a fundamental software engineering activity that embraces a broad range of applications, including but not limited to code recommendation, duplicate code, plagiarism, malware, and smell detection. This paper proposes a systematic literature review and meta-analysis on code similarity measurement and evaluation techniques to shed light on the existing approaches and their characteristics in different applications. We initially found over 10000 articles by querying four digital libraries and ended up with 136 primary studies in the field. The studies were classified according to their methodology, programming languages, datasets, tools, and applications. A deep investigation reveals 80 software tools, working with eight different techniques on five application domains. Nearly 49% of the tools work on Java programs and 37% support C and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Advanced Malware Detection Techniques · Web Data Mining and Analysis
