Scalable Program Clone Search Through Spectral Analysis
Tristan Benoit, Jean-Yves Marion, S\'ebastien Bardin

TL;DR
This paper introduces PSS, a spectral analysis method for efficient, precise, and robust program clone search at the program level, outperforming existing approaches in large repositories.
Contribution
The paper presents a novel spectral analysis technique, PSS, specifically designed for large-scale program clone search, addressing speed, accuracy, and robustness limitations of prior methods.
Findings
PSS achieves high precision in clone detection.
PSS is faster than existing methods on large datasets.
PSS demonstrates robustness against code variations.
Abstract
We consider the problem of program clone search, i.e. given a target program and a repository of known programs (all in executable format), the goal is to find the program in the repository most similar to the target program - with potential applications in terms of reverse engineering, program clustering, malware lineage and software theft detection. Recent years have witnessed a blooming in code similarity techniques, yet most of them focus on function-level similarity and function clone search, while we are interested in program-level similarity and program clone search. Actually, our study shows that prior similarity approaches are either too slow to handle large program repositories, or not precise enough, or yet not robust against slight variations introduced by compilers, source code versions or light obfuscations. We propose a novel spectral analysis method for program-level…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Advanced Malware Detection Techniques · Software Testing and Debugging Techniques
