Contrasting Human- and Machine-Generated Word-Level Adversarial Examples   for Text Classification

Maximilian Mozes; Max Bartolo; Pontus Stenetorp; Bennett Kleinberg,; Lewis D. Griffin

arXiv:2109.04385·cs.CL·September 10, 2021

Contrasting Human- and Machine-Generated Word-Level Adversarial Examples for Text Classification

Maximilian Mozes, Max Bartolo, Pontus Stenetorp, Bennett Kleinberg,, Lewis D. Griffin

PDF

Open Access 1 Repo

TL;DR

This paper compares human and machine-generated adversarial examples in text classification, showing humans can efficiently produce natural, sentiment-preserving attacks comparable to algorithms, with implications for NLP robustness.

Contribution

It introduces a human-in-the-loop approach to generate adversarial examples and compares their quality and efficiency to state-of-the-art algorithms.

Findings

01

Humans can generate many valid adversarial examples efficiently.

02

Human and algorithm-generated examples are similar in naturalness and sentiment preservation.

03

Humans are more computationally efficient in creating adversarial examples.

Abstract

Research shows that natural language processing models are generally considered to be vulnerable to adversarial attacks; but recent work has drawn attention to the issue of validating these adversarial inputs against certain criteria (e.g., the preservation of semantics and grammaticality). Enforcing constraints to uphold such criteria may render attacks unsuccessful, raising the question of whether valid attacks are actually feasible. In this work, we investigate this through the lens of human language ability. We report on crowdsourcing studies in which we task humans with iteratively modifying words in an input text, while receiving immediate model feedback, with the aim of causing a sentiment classification model to misclassify the example. Our findings suggest that humans are capable of generating a substantial amount of adversarial examples using semantics-preserving word…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

maximilianmozes/human_adversaries
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Hate Speech and Cyberbullying Detection

MethodsSeventeen Ways to Call Uphold Helpline Full Guide USA 24 Hour Assistance