A Sampling-based Tool for Plagiarism Detection in Student Texts
T. Kakkonen, N. Myller

TL;DR
AntiPlag is a novel sampling-based plagiarism detection tool that combines web and local document analysis, achieving high accuracy in identifying various forms of student text plagiarism.
Contribution
The paper introduces AntiPlag, a new system that integrates web sampling with hermetic detection, outperforming existing plagiarism detection tools.
Findings
Achieved 95.8% overall accuracy in tests.
Effectively detects verbatim and paraphrased plagiarism.
Outperforms SafeAssignment, TurnitIn, EVE2, and Plagiarism-Finder.
Abstract
This paper introduces AntiPlag, an advanced plagiarism detection tool intended for use on student texts. It is capable of both hermetic detection that scrutinizes only local collections of documents (other students' texts and lecture materials, for example) and web plagiarism detection, in which the aim is at identifying instances of plagiarism that have been sourced from the Internet. The main feature of the system is the sampling-based web plagiarism detection, a novel approach to plagiarism detection that is based on combining web and hermetic search technologies. The system uses standard web search engines to locate documents on the Internet that might have been used as sources of plagiarism by the writer of a text. During this sampling phase, the suspected sources are downloaded, converted to ASCII text and saved to the local database so that they can be later processed by using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAcademic integrity and plagiarism · Text Readability and Simplification · Software Engineering Research
