Adversarial Texts with Gradient Methods
Zhitao Gong, Wenlu Wang, Bo Li, Dawn Song, Wei-Shinn Ku

TL;DR
This paper introduces a framework adapting gradient-based adversarial attack methods from images to text, overcoming challenges of discrete input spaces and quality measurement, and demonstrates high-quality adversarial text generation.
Contribution
The work presents a novel approach to generate adversarial texts using gradient methods by operating in embedding space and employing WMD for quality assessment.
Findings
Effective adversarial texts can be generated with minimal word changes.
The framework successfully incorporates FGM and DeepFool methods.
WMD correlates strongly with adversarial text quality.
Abstract
Adversarial samples for images have been extensively studied in the literature. Among many of the attacking methods, gradient-based methods are both effective and easy to compute. In this work, we propose a framework to adapt the gradient attacking methods on images to text domain. The main difficulties for generating adversarial texts with gradient methods are i) the input space is discrete, which makes it difficult to accumulate small noise directly in the inputs, and ii) the measurement of the quality of the adversarial texts is difficult. We tackle the first problem by searching for adversarials in the embedding space and then reconstruct the adversarial texts via nearest neighbor search. For the latter problem, we employ the Word Mover's Distance (WMD) to quantify the quality of adversarial texts. Through extensive experiments on three datasets, IMDB movie reviews, Reuters-2 and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Digital Media Forensic Detection · Generative Adversarial Networks and Image Synthesis
