Beyond Black Box AI-Generated Plagiarism Detection: From Sentence to   Document Level

Mujahid Ali Quidwai; Chunhui Li; Parijat Dube

arXiv:2306.08122·cs.CL·June 16, 2023·5 cites

Beyond Black Box AI-Generated Plagiarism Detection: From Sentence to Document Level

Mujahid Ali Quidwai, Chunhui Li, Parijat Dube

PDF

Open Access

TL;DR

This paper introduces a multi-level NLP-based method for detecting AI-generated plagiarism in academic writing, achieving high accuracy and offering transparent, interpretable metrics at sentence and document levels.

Contribution

It presents a novel contrastive learning approach that enhances detection accuracy and interpretability without requiring retraining for new LLMs.

Findings

01

Achieves up to 94% classification accuracy

02

Provides quantifiable, interpretable metrics

03

Improves with advancements in LLM technology

Abstract

The increasing reliance on large language models (LLMs) in academic writing has led to a rise in plagiarism. Existing AI-generated text classifiers have limited accuracy and often produce false positives. We propose a novel approach using natural language processing (NLP) techniques, offering quantifiable metrics at both sentence and document levels for easier interpretation by human evaluators. Our method employs a multi-faceted approach, generating multiple paraphrased versions of a given question and inputting them into the LLM to generate answers. By using a contrastive loss function based on cosine similarity, we match generated sentences with those from the student's response. Our approach achieves up to 94% accuracy in classifying human and AI text, providing a robust and adaptable solution for plagiarism detection in academic settings. This method improves with LLM advancements,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Text Readability and Simplification · Artificial Intelligence in Healthcare and Education