TL;DR
This paper introduces a structured framework for AI-text detection that disentangles semantic content from generator artifacts, improving robustness and generalization across diverse language models.
Contribution
The proposed method advances AI-text detection by effectively separating semantic information from generator-specific artifacts, enhancing performance on unseen models.
Findings
Achieves up to 24.2% accuracy gain over state-of-the-art methods.
Demonstrates strong scalability with increasing generator diversity.
Maintains robust performance in open-set scenarios.
Abstract
As large language models (LLMs) generate text that increasingly resembles human writing, the subtle cues that distinguish AI-generated content from human-written content become increasingly challenging to capture. Reliance on generator-specific artifacts is inherently unstable, since new models emerge rapidly and reduce the robustness of such shortcuts. This generalizes unseen generators as a central and challenging problem for AI-text detection. To tackle this challenge, we propose a progressively structured framework that disentangles AI-detection semantics from generator-aware artifacts. This is achieved through a compact latent encoding that encourages semantic minimality, followed by perturbation-based regularization to reduce residual entanglement, and finally a discriminative adaptation stage that aligns representations with task objectives. Experiments on MAGE benchmark,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
