Identifying Machine-Paraphrased Plagiarism
Jan Philip Wahle, Terry Ruas, Tom\'a\v{s} Folt\'ynek, Norman Meuschke,, Bela Gipp

TL;DR
This study evaluates machine learning and neural language models, especially Longformer, for detecting machine-paraphrased plagiarism in academic texts, outperforming traditional text-matching tools.
Contribution
It introduces a comprehensive evaluation of pre-trained embedding models and neural language models for paraphrase detection, providing open data and tools for future research.
Findings
Longformer achieved an average F1 score of 81.0%.
Neural models outperform traditional text-matching systems.
Open datasets and tools are publicly available.
Abstract
Employing paraphrasing tools to conceal plagiarized text is a severe threat to academic integrity. To enable the detection of machine-paraphrased text, we evaluate the effectiveness of five pre-trained word embedding models combined with machine-learning classifiers and eight state-of-the-art neural language models. We analyzed preprints of research papers, graduation theses, and Wikipedia articles, which we paraphrased using different configurations of the tools SpinBot and SpinnerChief. The best-performing technique, Longformer, achieved an average F1 score of 81.0% (F1=99.7% for SpinBot and F1=71.6% for SpinnerChief cases), while human evaluators achieved F1=78.4% for SpinBot and F1=65.6% for SpinnerChief cases. We show that the automated classification alleviates shortcomings of widely-used text-matching systems, such as Turnitin and PlagScan. To facilitate future research, all…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Weight Decay · How do I get a human at Expedia immediately? (2025-2026) · Dropout · How do I complain to Expedia?*ComplainByAgent · Linear Warmup With Linear Decay · AdamW · Multi-Head Attention · Attention Is All You Need · How do I make a claim with Expedia?*Make FastClaimService
