ZC3: Zero-Shot Cross-Language Code Clone Detection
Jia Li, Chongyang Tao, Zhi Jin, Fang Liu, Jia Li, Ge Li

TL;DR
ZC3 introduces a zero-shot cross-language code clone detection method that creates language-agnostic representations without relying on parallel data, significantly improving detection accuracy across multiple languages.
Contribution
The paper presents a novel zero-shot approach using contrastive learning, domain-aware, and cycle consistency techniques to detect code clones across different programming languages without parallel data.
Findings
Outperforms state-of-the-art baselines by up to 67.12% in MAP score
Effective in aligning representations across multiple languages
Demonstrates robustness on four diverse datasets
Abstract
Developers introduce code clones to improve programming productivity. Many existing studies have achieved impressive performance in monolingual code clone detection. However, during software development, more and more developers write semantically equivalent programs with different languages to support different platforms and help developers translate projects from one language to another. Considering that collecting cross-language parallel data, especially for low-resource languages, is expensive and time-consuming, how designing an effective cross-language model that does not rely on any parallel data is a significant problem. In this paper, we propose a novel method named ZC3 for Zero-shot Cross-language Code Clone detection. ZC3 designs the contrastive snippet prediction to form an isomorphic representation space among different programming languages. Based on this, ZC3 exploits…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Reliability and Analysis Research · Software System Performance and Reliability
