MASH: Evading Black-Box AI-Generated Text Detectors via Style Humanization
Yongtong Gu, Songze Li, Xia Hu

TL;DR
MASH is a novel multi-stage framework that effectively evades black-box AI-generated text detectors by humanizing AI texts through style transfer, achieving high success rates across multiple datasets and detectors.
Contribution
Introduces MASH, a multi-stage style humanization method that outperforms existing evasion techniques in black-box scenarios with high success and quality.
Findings
MASH achieves an average attack success rate of 92%.
MASH surpasses baseline evaders by 24% on average.
MASH maintains high linguistic quality of generated texts.
Abstract
The increasing misuse of AI-generated texts (AIGT) has motivated the rapid development of AIGT detection methods. However, the reliability of these detectors remains fragile against adversarial evasions. Existing attack strategies often rely on white-box assumptions or demand prohibitively high computational and interaction costs, rendering them ineffective under practical black-box scenarios. In this paper, we propose Multi-stage Alignment for Style Humanization (MASH), a novel framework that evades black-box detectors based on style transfer. MASH sequentially employs style-injection supervised fine-tuning, direct preference optimization, and inference-time refinement to shape the distributions of AI-generated texts to resemble those of human-written texts. Experiments across 6 datasets and 5 detectors demonstrate the superior performance of MASH over 11 baseline evaders.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
