Adversarial Robustness of Neural-Statistical Features in Detection of Generative Transformers
Evan Crothers, Nathalie Japkowicz, Herna Viktor, Paula Branco

TL;DR
This paper evaluates the robustness of neural and statistical features in detecting AI-generated text, revealing that statistical features offer enhanced adversarial robustness and identifying promising features for improved detection.
Contribution
It introduces a comprehensive analysis of detection methods' robustness to adversarial attacks and proposes statistical features as a resilient alternative to neural features.
Findings
Statistical features provide better adversarial robustness than neural features.
Complex phrasal features are less effective against modern generative models.
ΔMAUVE is proposed as a proxy for human judgment of adversarial text quality.
Abstract
The detection of computer-generated text is an area of rapidly increasing significance as nascent generative models allow for efficient creation of compelling human-like text, which may be abused for the purposes of spam, disinformation, phishing, or online influence campaigns. Past work has studied detection of current state-of-the-art models, but despite a developing threat landscape, there has been minimal analysis of the robustness of detection methods to adversarial attacks. To this end, we evaluate neural and non-neural approaches on their ability to detect computer-generated text, their robustness against text adversarial attacks, and the impact that successful adversarial attacks have on human judgement of text quality. We find that while statistical features underperform neural features, statistical features provide additional adversarial robustness that can be leveraged in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
