Plagiarism deterrence for introductory programming
Simon J. Cohen, Michael J. Martin, Chance A. Shipley, Abhishek Kumar,, Andrew R. Cohen

TL;DR
This paper introduces a novel class-wide statistical approach and a compression-based similarity detection algorithm to deter plagiarism in introductory programming, providing transparent feedback and improving educational integrity.
Contribution
It proposes a new class-wide statistical characterization method and an automated deterrence system that enhances detection accuracy and transparency over existing pairwise comparison approaches.
Findings
System provides meaningful independence measurements from week one
Improves detection accuracy with compression-based similarity detection
Enhances transparency and student awareness of plagiarism risks
Abstract
Plagiarism in introductory programming courses is an enormous challenge for both students and institutions. For students, relying on the work of others too early in their academic development can make it impossible to acquire necessary skills for independent success in the future. For institutions, widespread student cheating can dilute the quality of the educational experience being offered. Currently available solutions consider only pairwise comparisons between student submissions and focus on punitive deterrence. Our approach instead relies on a class-wide statistical characterization that can be clearly and securely shared with students via an intuitive new p-value representing independence of student effort. A pairwise, compression-based similarity detection algorithm captures relationships between assignments more accurately. An automated deterrence system is used to warn…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAcademic integrity and plagiarism · Online Learning and Analytics · Imbalanced Data Classification Techniques
