Code Clone Detection via an AlphaFold-Inspired Framework
Changguo Jia, Yi Zhan, Tianqi Zhao, Hengzhi Ye, Minghui Zhou

TL;DR
This paper introduces AlphaCC, a novel code clone detection framework inspired by AlphaFold, which models code semantics through sequence-to-structure techniques, outperforming existing methods in accuracy and tool-independence.
Contribution
AlphaCC adapts AlphaFold's sequence-to-structure modeling to code clone detection, incorporating a retrieval-augmented MSA and modified attention mechanisms for improved semantic understanding.
Findings
AlphaCC outperforms all baselines on three datasets.
It demonstrates strong semantic clone detection capabilities.
Maintains efficiency suitable for large-scale applications.
Abstract
Code clone detection plays a critical role in software maintenance and vulnerability analysis. Substantial methods have been proposed to detect code clones. However, they struggle to extract high-level program semantics directly from a single linear token sequence, leading to unsatisfactory detection performance. A similar single-sequence challenge has been successfully addressed in protein structure prediction by AlphaFold. Motivated by the successful resolution of the shared single-sequence challenge by AlphaFold, as well as the sequential similarities between proteins and code, we leverage AlphaFold for code clone detection. In particular, we propose AlphaCC, which represents code fragments as token sequences and adapts AlphaFold's sequence-to-structure modeling capability to infer code semantics. The pipeline of AlphaCC goes through three steps. First, AlphaCC transforms each input…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Advanced Malware Detection Techniques
