T-Detect: Tail-Aware Statistical Normalization for Robust Detection of Adversarial Machine-Generated Text
Alva West, Luodan Zhang, Liuliu Zhang, Minjun Zhu, Yixuan Weng, Yue Zhang

TL;DR
T-Detect introduces a heavy-tailed statistical normalization technique using Student's t-distribution to improve the robustness of adversarial text detection, outperforming existing methods on benchmark datasets.
Contribution
The paper presents a novel heavy-tailed normalization approach for adversarial text detection, replacing Gaussian assumptions with Student's t-distribution, backed by theoretical and empirical validation.
Findings
Improves AUROC by up to 3.9% on RAID benchmark.
Achieves state-of-the-art AUROC of 0.926 on RAID Books domain.
Demonstrates robustness against adversarial perturbations.
Abstract
Large language models (LLMs) have shown the capability to generate fluent and logical content, presenting significant challenges to machine-generated text detection, particularly text polished by adversarial perturbations such as paraphrasing. Current zero-shot detectors often employ Gaussian distributions as statistical measure for computing detection thresholds, which falters when confronted with the heavy-tailed statistical artifacts characteristic of adversarial or non-native English texts. In this paper, we introduce T-Detect, a novel detection method that fundamentally redesigns the curvature-based detectors. Our primary innovation is the replacement of standard Gaussian normalization with a heavy-tailed discrepancy score derived from the Student's t-distribution. This approach is theoretically grounded in the empirical observation that adversarial texts exhibit significant…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Digital Media Forensic Detection · Adversarial Robustness in Machine Learning
