Same Same But Different: Preventing Refactoring Attacks on Software Plagiarism Detection
Robin Maisch, Larissa Schmid, Timur Sa\u{g}lam, Nils Niehues

TL;DR
This paper introduces an extensible framework that improves the detection of software plagiarism by countering sophisticated refactoring-based obfuscation techniques using code property graphs and graph transformations.
Contribution
It presents a novel framework that enhances existing detectors to effectively identify obfuscated code, including AI-based and algorithmic refactoring attacks.
Findings
Significant improvement in detecting obfuscated plagiarized code
Effective against both algorithmic and AI-based obfuscation
Framework is extensible and adaptable to real-world scenarios
Abstract
Plagiarism detection in programming education faces growing challenges due to increasingly sophisticated obfuscation techniques, particularly automated refactoring-based attacks. While code plagiarism detection systems used in education practice are resilient against basic obfuscation, they struggle against structural modifications that preserve program behavior, especially caused by refactoring-based obfuscation. This paper presents a novel and extensible framework that enhances state-of-the-art detectors by leveraging code property graphs and graph transformations to counteract refactoring-based obfuscation. Our comprehensive evaluation of real-world student submissions, obfuscated using both algorithmic and AI-based obfuscation attacks, demonstrates a significant improvement in detecting plagiarized code.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
