Evaluating the robustness of source code plagiarism detection tools to pervasive plagiarism-hiding modifications
Hayden Cheers, Yuqing Lin, Shamus P. Smith

TL;DR
This paper evaluates 11 source code plagiarism detection tools for robustness against common plagiarism-hiding modifications, revealing that most are vulnerable, but graph-based tools like JPlag and Plaggie show greater resilience.
Contribution
The study provides a comprehensive evaluation of existing tools' robustness to plagiarism-hiding techniques, highlighting the relative strengths of graph-based approaches.
Findings
Most tools are not robust against fine-grained code transformations.
JPlag and Plaggie demonstrate the highest robustness among evaluated tools.
Graph-based tools, especially those using program dependence graphs, tend to be more resilient.
Abstract
Source code plagiarism is a common occurrence in undergraduate computer science education. In order to identify such cases, many source code plagiarism detection tools have been proposed. A source code plagiarism detection tool evaluates pairs of assignment submissions to detect indications of plagiarism. However, a plagiarising student will commonly apply plagiarism-hiding modifications to source code in an attempt to evade detection. Subsequently, prior work has implied that currently available source code plagiarism detection tools are not robust to the application of pervasive plagiarism-hiding modifications. In this article, 11 source code plagiarism detection tools are evaluated for robustness against plagiarism-hiding modifications. The tools are evaluated with data sets of simulated undergraduate plagiarism, constructed with source code modifications representative of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
