Adversarial Robustness of Neural-Statistical Features in Detection of   Generative Transformers

Evan Crothers; Nathalie Japkowicz; Herna Viktor; Paula Branco

arXiv:2203.07983·cs.CL·October 5, 2022·1 cites

Adversarial Robustness of Neural-Statistical Features in Detection of Generative Transformers

Evan Crothers, Nathalie Japkowicz, Herna Viktor, Paula Branco

PDF

Open Access 1 Repo

TL;DR

This paper evaluates the robustness of neural and statistical features in detecting AI-generated text, revealing that statistical features offer enhanced adversarial robustness and identifying promising features for improved detection.

Contribution

It introduces a comprehensive analysis of detection methods' robustness to adversarial attacks and proposes statistical features as a resilient alternative to neural features.

Findings

01

Statistical features provide better adversarial robustness than neural features.

02

Complex phrasal features are less effective against modern generative models.

03

ΔMAUVE is proposed as a proxy for human judgment of adversarial text quality.

Abstract

The detection of computer-generated text is an area of rapidly increasing significance as nascent generative models allow for efficient creation of compelling human-like text, which may be abused for the purposes of spam, disinformation, phishing, or online influence campaigns. Past work has studied detection of current state-of-the-art models, but despite a developing threat landscape, there has been minimal analysis of the robustness of detection methods to adversarial attacks. To this end, we evaluate neural and non-neural approaches on their ability to detect computer-generated text, their robustness against text adversarial attacks, and the impact that successful adversarial attacks have on human judgement of text quality. We find that while statistical features underperform neural features, statistical features provide additional adversarial robustness that can be leveraged in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ecrows/cgtext-detection-adv
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications