Spotting AI's Touch: Identifying LLM-Paraphrased Spans in Text

Yafu Li; Zhilin Wang; Leyang Cui; Wei Bi; Shuming Shi; Yue Zhang

arXiv:2405.12689·cs.CL·May 30, 2024·1 cites

Spotting AI's Touch: Identifying LLM-Paraphrased Spans in Text

Yafu Li, Zhilin Wang, Leyang Cui, Wei Bi, Shuming Shi, Yue Zhang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces PTD, a new framework for detecting AI-paraphrased text spans within full texts, supported by a dedicated dataset, demonstrating effectiveness and generalization across various paraphrasing scenarios.

Contribution

The paper presents a novel paraphrased text span detection method and a new dataset, PASTED, for identifying AI-generated paraphrased segments within texts.

Findings

01

PTD effectively detects paraphrased spans in full texts.

02

Models generalize well to different paraphrasing prompts.

03

Context surrounding paraphrased spans is crucial for detection accuracy.

Abstract

AI-generated text detection has attracted increasing attention as powerful language models approach human-level generation. Limited work is devoted to detecting (partially) AI-paraphrased texts. However, AI paraphrasing is commonly employed in various application scenarios for text refinement and diversity. To this end, we propose a novel detection framework, paraphrased text span detection (PTD), aiming to identify paraphrased text spans within a text. Different from text-level detection, PTD takes in the full text and assigns each of the sentences with a score indicating the paraphrasing degree. We construct a dedicated dataset, PASTED, for paraphrased text span detection. Both in-distribution and out-of-distribution results demonstrate the effectiveness of PTD models in identifying AI-paraphrased text spans. Statistical and model analysis explains the crucial role of the surrounding…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

linzwcs/pasted
pytorchOfficial

Videos

Spotting AI’s Touch: Identifying LLM-Paraphrased Spans in Text· underline

Taxonomy

TopicsNatural Language Processing Techniques · Text Readability and Simplification · Topic Modeling