StealthRL: Reinforcement Learning Paraphrase Attacks for Multi-Detector Evasion of AI-Text Detectors
Suraj Ranganath, Atharv Ramesh

TL;DR
StealthRL introduces a reinforcement learning framework that generates adversarial paraphrases to effectively evade multiple AI-text detectors, exposing significant robustness vulnerabilities in current detection methods.
Contribution
This work presents StealthRL, a novel reinforcement learning approach using Group Relative Policy Optimization with LoRA adapters to stress-test AI-text detectors against realistic paraphrasing attacks.
Findings
Achieves near-zero detection rates on most tested detectors.
Reduces mean AUROC from 0.79 to 0.43, indicating lowered detection performance.
Attacks transfer to unseen detectors, revealing shared vulnerabilities.
Abstract
AI-text detectors face a critical robustness challenge: adversarial paraphrasing attacks that preserve semantics while evading detection. We introduce StealthRL, a reinforcement learning framework that stress-tests detector robustness under realistic adversarial conditions. StealthRL trains a paraphrase policy against a multi-detector ensemble using Group Relative Policy Optimization (GRPO) with LoRA adapters on Qwen3-4B, optimizing a composite reward that balances detector evasion with semantic preservation. We evaluate six attack settings (M0-M5) on the full filtered MAGE test pool (15,310 human / 14,656 AI) against four detectors: RoBERTa, Fast-DetectGPT, Binoculars, and MAGE. StealthRL achieves near-zero detection on three of the four detectors and a 0.024 mean TPR@1%FPR, reducing mean AUROC from 0.79 to 0.43 and attaining a 97.6% attack success rate. Critically, attacks transfer to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Ethics and Social Impacts of AI · Advanced Neural Network Applications
