HyClone: Bridging LLM Understanding and Dynamic Execution for Semantic Code Clone Detection
Yunhao Liang, Ruixuan Ying, Takuya Taniguchi, Guwen Lyu, Zhe Cui

TL;DR
HyClone introduces a two-stage framework combining LLM screening and execution validation to improve semantic code clone detection accuracy in Python, addressing limitations of purely LLM-based methods.
Contribution
This work presents a novel hybrid approach that effectively detects semantic code clones by integrating language model analysis with execution-based validation.
Findings
Significant improvements in precision, recall, and F1-score over baseline methods.
Effective identification of semantic clones despite syntactic differences.
Framework demonstrates robustness in Python code clone detection.
Abstract
Code clone detection is a critical task in software engineering, aimed at identifying duplicated or similar code fragments within or across software systems. Traditional methods often fail to capture functional equivalence, particularly for semantic clones (Type 4), where code fragments implement identical functionality despite differing syntactic structures. Recent advances in large language models (LLMs) have shown promise in understanding code semantics. However, directly applying LLMs to code clone detection yields suboptimal results due to their sensitivity to syntactic differences. To address these challenges, we propose a novel two-stage framework that combines LLM-based screening with execution-based validation for detecting semantic clones in Python programs. In the first stage, an LLM evaluates code pairs to filter out obvious non-clones based on semantic analysis. For pairs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Advanced Malware Detection Techniques
