Evaluation of Contrastive Learning with Various Code Representations for Code Clone Detection
Maksim Zubkov, Egor Spirin, Egor Bogomolov, Timofey Bryksin

TL;DR
This paper evaluates contrastive learning algorithms combined with different code representations for code clone and plagiarism detection, finding graph-based models and specific CL algorithms like SimCLR and SwAV most effective.
Contribution
It systematically compares popular contrastive learning algorithms with various code representations on clone and plagiarism detection tasks, introducing a new dataset for plagiarism detection.
Findings
Graph-based models outperform others in both tasks.
SimCLR and SwAV yield better results among CL algorithms.
Moco demonstrates the most robustness across tasks.
Abstract
Code clones are pairs of code snippets that implement similar functionality. Clone detection is a fundamental branch of automatic source code comprehension, having many applications in refactoring recommendation, plagiarism detection, and code summarization. A particularly interesting case of clone detection is the detection of semantic clones, i.e., code snippets that have the same functionality but significantly differ in implementation. A promising approach to detecting semantic clones is contrastive learning (CL), a machine learning paradigm popular in computer vision but not yet commonly adopted for code processing. Our work aims to evaluate the most popular CL algorithms combined with three source code representations on two tasks. The first task is code clone detection, which we evaluate on the POJ-104 dataset containing implementations of 104 algorithms. The second task is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Text Readability and Simplification
MethodsBitcoin Customer Service Number +1-833-534-1729 · *Communicated@Fast*How Do I Communicate to Expedia? · Residual Connection · Convolution · Average Pooling · InfoNCE · Residual Block · Max Pooling · 1x1 Convolution · Global Average Pooling
