Measuring Plagiarism in Introductory Programming Course Assignments

Muhammad Humayoun; Muhammad Adnan Hashmi; Ali Hanzala Khan

arXiv:2205.08520·cs.CL·May 31, 2022·1 cites

Measuring Plagiarism in Introductory Programming Course Assignments

Muhammad Humayoun, Muhammad Adnan Hashmi, Ali Hanzala Khan

PDF

Open Access

TL;DR

This paper presents a framework for detecting plagiarism in introductory programming assignments using token-based similarity methods, achieving high accuracy with an F1 score above 0.95 on both real and synthetic datasets.

Contribution

It introduces a novel framework combining multiple similarity features and evaluates their effectiveness, including the use of artificially generated data to enhance detection accuracy.

Findings

01

F1 score of 0.955 on original data

02

F1 score of 0.971 on synthetic data

03

Artificial data improves detection results

Abstract

Measuring plagiarism in programming assignments is an essential task to the educational procedure. This paper discusses the methods of plagiarism and its detection in introductory programming course assignments written in C++. A small corpus of assignments is made publically available. A general framework to compute the similarity between a solution pair is developed that uses the three token-based similarity methods as features and predicts if the solution is plagiarized. The importance of each feature is also measured, which in return ranks the effectiveness of each method in use. Finally, the artificially generated dataset improves the results compared to the original data. We achieved an F1 score of 0.955 and 0.971 on original and synthetic datasets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAcademic integrity and plagiarism · Online Learning and Analytics · Software Engineering Research