StealthRL: Reinforcement Learning Paraphrase Attacks for Multi-Detector Evasion of AI-Text Detectors

Suraj Ranganath; Atharv Ramesh

arXiv:2602.08934·cs.LG·March 23, 2026

StealthRL: Reinforcement Learning Paraphrase Attacks for Multi-Detector Evasion of AI-Text Detectors

Suraj Ranganath, Atharv Ramesh

PDF

Open Access 1 Models

TL;DR

StealthRL introduces a reinforcement learning framework that generates adversarial paraphrases to effectively evade multiple AI-text detectors, exposing significant robustness vulnerabilities in current detection methods.

Contribution

This work presents StealthRL, a novel reinforcement learning approach using Group Relative Policy Optimization with LoRA adapters to stress-test AI-text detectors against realistic paraphrasing attacks.

Findings

01

Achieves near-zero detection rates on most tested detectors.

02

Reduces mean AUROC from 0.79 to 0.43, indicating lowered detection performance.

03

Attacks transfer to unseen detectors, revealing shared vulnerabilities.

Abstract

AI-text detectors face a critical robustness challenge: adversarial paraphrasing attacks that preserve semantics while evading detection. We introduce StealthRL, a reinforcement learning framework that stress-tests detector robustness under realistic adversarial conditions. StealthRL trains a paraphrase policy against a multi-detector ensemble using Group Relative Policy Optimization (GRPO) with LoRA adapters on Qwen3-4B, optimizing a composite reward that balances detector evasion with semantic preservation. We evaluate six attack settings (M0-M5) on the full filtered MAGE test pool (15,310 human / 14,656 AI) against four detectors: RoBERTa, Fast-DetectGPT, Binoculars, and MAGE. StealthRL achieves near-zero detection on three of the four detectors and a 0.024 mean TPR@1%FPR, reducing mean AUROC from 0.79 to 0.43 and attaining a 97.6% attack success rate. Critically, attacks transfer to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
suraj-ranganath/StealthRL
model· 427 dl
427 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Ethics and Social Impacts of AI · Advanced Neural Network Applications