Zero-shot Evaluation of Deep Learning for Java Code Clone Detection

Thomas S. Heinze

arXiv:2604.13783·cs.SE·April 16, 2026

Zero-shot Evaluation of Deep Learning for Java Code Clone Detection

Thomas S. Heinze

PDF

TL;DR

This paper evaluates the ability of deep learning models to detect Java code clones in zero-shot scenarios, revealing limited generalizability and outperforming traditional tools in some cases.

Contribution

It provides a comprehensive zero-shot evaluation of state-of-the-art DL-based clone detectors for Java, highlighting their limitations compared to conventional methods.

Findings

01

DL models have limited generalizability to unseen code.

02

NiCad outperforms DL models in zero-shot scenarios.

03

Transformers like CodeBERT do not significantly outperform traditional tools.

Abstract

Deep Learning (DL) is becoming more and more widespread in clone detection, motivated by achieving near-perfect performance for this task. In particular in case of semantic code clones, which share only limited syntax but implement the same or similar functionality, Deep Learning appears to outperform conventional tools. In this paper, we want to investigate the generalizability of DL-based clone detectors for Java. We therefore replicate and evaluate the performance of five state-of-the-art DL-based clone detectors, including Transformers like CodeBERT and single-task models like FA-AST+GMN, in a zero-shot evaluation scenario, where we train/fine-tune and evaluate on different datasets and functionalities. Our experiments demonstrate that the models' generalizability to unseen code is limited. Further analysis reveals that the conventional clone detector NiCad even outperforms the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.