Adversarial Paraphrasing: A Universal Attack for Humanizing AI-Generated Text

Yize Cheng; Vinu Sankar Sadasivan; Mehrdad Saberi; Shoumik Saha; Soheil Feizi

arXiv:2506.07001·cs.CL·October 31, 2025

Adversarial Paraphrasing: A Universal Attack for Humanizing AI-Generated Text

Yize Cheng, Vinu Sankar Sadasivan, Mehrdad Saberi, Shoumik Saha, Soheil Feizi

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a training-free adversarial paraphrasing method that effectively evades AI-generated text detectors by humanizing content, demonstrating high transferability and exposing vulnerabilities in current detection systems.

Contribution

The authors propose a universal, training-free adversarial paraphrasing framework guided by an off-the-shelf LLM to bypass AI-generated text detectors, highlighting the need for more robust detection methods.

Findings

01

Significantly reduces detection rates across multiple systems

02

Achieves up to 98.96% reduction in false positives

03

Maintains mostly high text quality with slight degradation

Abstract

The increasing capabilities of Large Language Models (LLMs) have raised concerns about their misuse in AI-generated plagiarism and social engineering. While various AI-generated text detectors have been proposed to mitigate these risks, many remain vulnerable to simple evasion techniques such as paraphrasing. However, recent detectors have shown greater robustness against such basic attacks. In this work, we introduce Adversarial Paraphrasing, a training-free attack framework that universally humanizes any AI-generated text to evade detection more effectively. Our approach leverages an off-the-shelf instruction-following LLM to paraphrase AI-generated content under the guidance of an AI text detector, producing adversarial examples that are specifically optimized to bypass detection. Extensive experiments show that our attack is both broadly effective and highly transferable across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chengez/adversarial-paraphrasing
pytorchOfficial

Videos

Adversarial Paraphrasing: A Universal Attack for Humanizing AI-Generated Text· slideslive

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Hate Speech and Cyberbullying Detection

MethodsSparse Evolutionary Training