StyleShield: Exposing the Fragility of AIGC Detectors through Continuous Controllable Style Transfer
Guantian Zheng

TL;DR
StyleShield introduces a novel flow matching framework for controllable style transfer in text, exposing the fragility of AIGC detectors by enabling evasion with high success rates.
Contribution
It presents the first flow matching approach for conditional text style transfer in continuous embedding space, with a novel inference paradigm for evasion.
Findings
Achieves 94.6% evasion against training detectors
Attains >=99% evasion against unseen detectors
Maintains high semantic similarity of 0.928 during evasion
Abstract
AI-generated content (AIGC) detectors are increasingly deployed in high-stakes settings such as academic integrity screening, yet their reliability rests on a fundamental paradox: as language models are trained on human-written corpora, the statistical boundary between AI and human writing will inevitably dissolve as models improve. Commercial incentives have further distorted this landscape -- detection services and "de-AIification" tools often operate within the same supply chain, replacing evaluation of content quality with judgment of content origin. We present StyleShield, the first flow matching framework for conditional text style transfer, operating directly in continuous token embedding space via a DiT backbone with zero-initialized cross-attention adapters conditioned on frozen Qwen-7B representations. At inference, we adapt the SDEdit paradigm from image synthesis to text…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
