StyleShield: Exposing the Fragility of AIGC Detectors through Continuous Controllable Style Transfer

Guantian Zheng

arXiv:2605.00924·cs.LG·May 5, 2026

StyleShield: Exposing the Fragility of AIGC Detectors through Continuous Controllable Style Transfer

Guantian Zheng

PDF

TL;DR

StyleShield introduces a novel flow matching framework for controllable style transfer in text, exposing the fragility of AIGC detectors by enabling evasion with high success rates.

Contribution

It presents the first flow matching approach for conditional text style transfer in continuous embedding space, with a novel inference paradigm for evasion.

Findings

01

Achieves 94.6% evasion against training detectors

02

Attains >=99% evasion against unseen detectors

03

Maintains high semantic similarity of 0.928 during evasion

Abstract

AI-generated content (AIGC) detectors are increasingly deployed in high-stakes settings such as academic integrity screening, yet their reliability rests on a fundamental paradox: as language models are trained on human-written corpora, the statistical boundary between AI and human writing will inevitably dissolve as models improve. Commercial incentives have further distorted this landscape -- detection services and "de-AIification" tools often operate within the same supply chain, replacing evaluation of content quality with judgment of content origin. We present StyleShield, the first flow matching framework for conditional text style transfer, operating directly in continuous token embedding space via a DiT backbone with zero-initialized cross-attention adapters conditioned on frozen Qwen-7B representations. At inference, we adapt the SDEdit paradigm from image synthesis to text…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.