Code Similarity on High Level Programs
M. Miron Bernal, H. Coyote Estrada, J. Figueroa Nazuno

TL;DR
This paper introduces a novel method for assessing code similarity in high-level programs using Fast Dynamic Time Warping on time series representations derived from source code, enabling subsequence detection without feature extraction.
Contribution
It presents a new approach that applies Fast Dynamic Time Warping to time series representations of code, avoiding feature extraction and improving subsequence similarity detection.
Findings
Effective detection of similar code subsequences.
No need for feature extraction in code similarity analysis.
Experimental results confirm the method's accuracy.
Abstract
This paper presents a new approach for code similarity on High Level programs. Our technique is based on Fast Dynamic Time Warping, that builds a warp path or points relation with local restrictions. The source code is represented into Time Series using the operators inside programming languages that makes possible the comparison. This makes possible subsequence detection that represent similar code instructions. In contrast with other code similarity algorithms, we do not make features extraction. The experiments show that two source codes are similar when their respective Time Series are similar.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Advanced Malware Detection Techniques · Software Testing and Debugging Techniques
