Evaluating Software Plagiarism Detection in the Age of AI: Automated Obfuscation and Lessons for Academic Integrity
Timur Sa\u{g}lam, Larissa Schmid

TL;DR
This study assesses the robustness of software plagiarism detection tools against advanced automated and AI-generated obfuscation techniques, revealing significant improvements but also highlighting ongoing vulnerabilities impacting academic integrity.
Contribution
It provides a comprehensive evaluation of defense mechanisms against AI-based obfuscation, expanding understanding of their effectiveness on real-world datasets and diverse attack methods.
Findings
Defense mechanisms significantly improve detection rates.
AI-generated obfuscation remains a challenge for current tools.
Enhanced detection across over four million program comparisons.
Abstract
Plagiarism in programming assignments is a persistent issue in computer science education, increasingly complicated by the emergence of automated obfuscation attacks. While software plagiarism detectors are widely used to identify suspicious similarities at scale and are resilient to simple obfuscation techniques, they are vulnerable to advanced obfuscation based on structural modification of program code that preserves the original program behavior. While different defense mechanisms have been proposed to increase resilience against these attacks, their current evaluation is limited to the scope of attacks used and lacks a comprehensive investigation regarding AI-based obfuscation. In this paper, we investigate the resilience of these defense mechanisms against a broad range of automated obfuscation attacks, including both algorithmic and AI-generated methods, and for a wide variety of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAcademic integrity and plagiarism · Law, AI, and Intellectual Property · Artificial Intelligence in Healthcare and Education
