PADBen: A Comprehensive Benchmark for Evaluating AI Text Detectors Against Paraphrase Attacks

Yiwei Zha; Rui Min; Shanu Sushmita

arXiv:2511.00416·cs.CL·November 4, 2025

PADBen: A Comprehensive Benchmark for Evaluating AI Text Detectors Against Paraphrase Attacks

Yiwei Zha, Rui Min, Shanu Sushmita

PDF

Open Access

TL;DR

This paper introduces PADBen, a comprehensive benchmark for evaluating AI text detectors against paraphrase attacks, revealing current detectors' weaknesses and the need for new detection strategies.

Contribution

The paper presents PADBen, the first benchmark systematically assessing detector robustness against paraphrase attacks, including a detailed taxonomy and evaluation of 11 detectors.

Findings

01

Detectors succeed against plagiarism evasion but fail against authorship obfuscation.

02

Current detection methods cannot effectively handle intermediate laundering regions.

03

Fundamental advances are needed beyond existing semantic and stylistic discrimination techniques.

Abstract

While AI-generated text (AIGT) detectors achieve over 90\% accuracy on direct LLM outputs, they fail catastrophically against iteratively-paraphrased content. We investigate why iteratively-paraphrased text -- itself AI-generated -- evades detection systems designed for AIGT identification. Through intrinsic mechanism analysis, we reveal that iterative paraphrasing creates an intermediate laundering region characterized by semantic displacement with preserved generation patterns, which brings up two attack categories: paraphrasing human-authored text (authorship obfuscation) and paraphrasing LLM-generated text (plagiarism evasion). To address these vulnerabilities, we introduce PADBen, the first benchmark systematically evaluating detector robustness against both paraphrase attack scenarios. PADBen comprises a five-type text taxonomy capturing the full trajectory from original content…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuthorship Attribution and Profiling · Topic Modeling · Hate Speech and Cyberbullying Detection