Spotting AI's Touch: Identifying LLM-Paraphrased Spans in Text
Yafu Li, Zhilin Wang, Leyang Cui, Wei Bi, Shuming Shi, Yue Zhang

TL;DR
This paper introduces PTD, a new framework for detecting AI-paraphrased text spans within full texts, supported by a dedicated dataset, demonstrating effectiveness and generalization across various paraphrasing scenarios.
Contribution
The paper presents a novel paraphrased text span detection method and a new dataset, PASTED, for identifying AI-generated paraphrased segments within texts.
Findings
PTD effectively detects paraphrased spans in full texts.
Models generalize well to different paraphrasing prompts.
Context surrounding paraphrased spans is crucial for detection accuracy.
Abstract
AI-generated text detection has attracted increasing attention as powerful language models approach human-level generation. Limited work is devoted to detecting (partially) AI-paraphrased texts. However, AI paraphrasing is commonly employed in various application scenarios for text refinement and diversity. To this end, we propose a novel detection framework, paraphrased text span detection (PTD), aiming to identify paraphrased text spans within a text. Different from text-level detection, PTD takes in the full text and assigns each of the sentences with a score indicating the paraphrasing degree. We construct a dedicated dataset, PASTED, for paraphrased text span detection. Both in-distribution and out-of-distribution results demonstrate the effectiveness of PTD models in identifying AI-paraphrased text spans. Statistical and model analysis explains the crucial role of the surrounding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Text Readability and Simplification · Topic Modeling
