Can Code Evaluation Metrics Detect Code Plagiarism?

Fahad Ebrahim; Mike Joy (The University of Warwick)

arXiv:2604.25778·cs.SE·April 29, 2026

Can Code Evaluation Metrics Detect Code Plagiarism?

Fahad Ebrahim, Mike Joy (The University of Warwick)

PDF

TL;DR

This study evaluates whether code evaluation metrics can effectively detect code plagiarism across various modification levels, comparing them with dedicated plagiarism detection tools.

Contribution

It provides an empirical comparison of code evaluation metrics and plagiarism detection tools, revealing their relative effectiveness at different modification levels.

Findings

01

Dolos performs best without preprocessing at the overall level.

02

CrystalBLEU, CodeBLEU, and RUBY outperform JPlag in ranking performance.

03

Performance declines at higher modification levels, but CrystalBLEU remains competitive.

Abstract

Source Code Plagiarism Detection (SCPD) plays an important role in maintaining fairness and academic integrity in software engineering education. Code Evaluation Metrics (CEMs) are developed for assessing code generation tasks. However, it remains unclear whether such metrics can reliably detect plagiarism across different levels of modification (L1-L6), increasing in complexity. In this paper, we perform a comparative empirical study using two open-source labelled datasets, ConPlag (raw and template-free versions) and IRPlag. We evaluate five CEMs, namely CodeBLEU, CrystalBLEU, RUBY, Tree Structured Edit Distance (TSED), and CodeBERTScore. The performance is evaluated using threshold-free ranking-based measures to assess overall, per dataset, and per-level plagiarism performance. The results are compared against state-of-the-art (SOTA) Source Code Plagiarism Detection Tools (SCPDTs),…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.