Generating Natural Language Adversarial Examples

Moustafa Alzantot; Yash Sharma; Ahmed Elgohary; Bo-Jhang Ho; Mani; Srivastava; Kai-Wei Chang

arXiv:1804.07998·cs.CL·September 26, 2018

Generating Natural Language Adversarial Examples

Moustafa Alzantot, Yash Sharma, Ahmed Elgohary, Bo-Jhang Ho, Mani, Srivastava, Kai-Wei Chang

PDF

5 Repos

TL;DR

This paper presents a black-box optimization method to generate natural language adversarial examples that are semantically similar yet fool sentiment analysis and entailment models with high success rates, highlighting challenges in NLP robustness.

Contribution

Introduces a novel black-box population-based approach for creating realistic adversarial text examples that effectively deceive NLP models, demonstrating their strength and diversity.

Findings

01

97% success rate on sentiment analysis models

02

70% success rate on textual entailment models

03

92.3% of adversarial examples are perceived as similar by humans

Abstract

Deep neural networks (DNNs) are vulnerable to adversarial examples, perturbations to correctly classified examples which can cause the model to misclassify. In the image domain, these perturbations are often virtually indistinguishable to human perception, causing humans and state-of-the-art models to disagree. However, in the natural language domain, small perturbations are clearly perceptible, and the replacement of a single word can drastically alter the semantics of the document. Given these challenges, we use a black-box population-based optimization algorithm to generate semantically and syntactically similar adversarial examples that fool well-trained sentiment analysis and textual entailment models with success rates of 97% and 70%, respectively. We additionally demonstrate that 92.3% of the successful sentiment analysis adversarial examples are classified to their original…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.