MASH: Evading Black-Box AI-Generated Text Detectors via Style Humanization

Yongtong Gu; Songze Li; Xia Hu

arXiv:2601.08564·cs.CR·April 21, 2026

MASH: Evading Black-Box AI-Generated Text Detectors via Style Humanization

Yongtong Gu, Songze Li, Xia Hu

PDF

TL;DR

MASH is a novel multi-stage framework that effectively evades black-box AI-generated text detectors by humanizing AI texts through style transfer, achieving high success rates across multiple datasets and detectors.

Contribution

Introduces MASH, a multi-stage style humanization method that outperforms existing evasion techniques in black-box scenarios with high success and quality.

Findings

01

MASH achieves an average attack success rate of 92%.

02

MASH surpasses baseline evaders by 24% on average.

03

MASH maintains high linguistic quality of generated texts.

Abstract

The increasing misuse of AI-generated texts (AIGT) has motivated the rapid development of AIGT detection methods. However, the reliability of these detectors remains fragile against adversarial evasions. Existing attack strategies often rely on white-box assumptions or demand prohibitively high computational and interaction costs, rendering them ineffective under practical black-box scenarios. In this paper, we propose Multi-stage Alignment for Style Humanization (MASH), a novel framework that evades black-box detectors based on style transfer. MASH sequentially employs style-injection supervised fine-tuning, direct preference optimization, and inference-time refinement to shape the distributions of AI-generated texts to resemble those of human-written texts. Experiments across 6 datasets and 5 detectors demonstrate the superior performance of MASH over 11 baseline evaders.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.