Detecting Essence Code Clones via Information Theoretic Analysis
Lida Zhao, Shihan Dou, Yutao Hu, Yueming Wu, Jiahui Wu, Chengwei Liu,, Lyuye Zhang, Yi Liu, Jun Sun, Xuanjing Huang, and Yang Liu

TL;DR
This paper introduces ECScan, an information-theoretic tool that effectively detects essence code clones by focusing on semantic core logic, outperforming existing methods in real-world software projects.
Contribution
The paper presents ECScan, a novel clone detection approach that emphasizes semantic importance using information theory, addressing limitations of syntactic-based techniques.
Findings
ECScan achieves an average F1-score of 85% in detecting essence clones.
ECScan outperforms existing clone detection tools across various clone types.
The approach demonstrates high scalability and robustness in real-world projects.
Abstract
Code cloning, a widespread practice in software development, involves replicating code fragments to save time but often at the expense of software maintainability and quality. In this paper, we address the specific challenge of detecting "essence clones", a complex subtype of Type-3 clones characterized by sharing critical logic despite different peripheral codes. Traditional techniques often fail to detect essence clones due to their syntactic focus. To overcome this limitation, we introduce ECScan, a novel detection tool that leverages information theory to assess the semantic importance of code lines. By assigning weights to each line based on its information content, ECScan emphasizes core logic over peripheral code differences. Our comprehensive evaluation across various real-world projects shows that ECScan significantly outperforms existing tools in detecting essence clones,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Scientific Computing and Data Management
