Navigating the Shadows: Unveiling Effective Disturbances for Modern AI Content Detectors
Ying Zhou, Ben He, Le Sun

TL;DR
This paper systematically evaluates the robustness of AI-text detectors against various perturbations in real-world scenarios, proposing new perturbation methods and analyzing the effects of data augmentation on detection performance.
Contribution
It introduces 12 novel black-box perturbation techniques and provides a comprehensive assessment of detector robustness in practical settings.
Findings
Current detectors lack robustness against perturbations.
Perturbation techniques significantly impact detection accuracy.
Data augmentation can improve detector resilience.
Abstract
With the launch of ChatGPT, large language models (LLMs) have attracted global attention. In the realm of article writing, LLMs have witnessed extensive utilization, giving rise to concerns related to intellectual property protection, personal privacy, and academic integrity. In response, AI-text detection has emerged to distinguish between human and machine-generated content. However, recent research indicates that these detection systems often lack robustness and struggle to effectively differentiate perturbed texts. Currently, there is a lack of systematic evaluations regarding detection performance in real-world applications, and a comprehensive examination of perturbation techniques and detector robustness is also absent. To bridge this gap, our work simulates real-world scenarios in both informal and professional writing, exploring the out-of-the-box performance of current…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Machine Learning and Data Classification
