Greedy Attack and Gumbel Attack: Generating Adversarial Examples for Discrete Data
Puyudi Yang, Jianbo Chen, Cho-Jui Hsieh, Jane-Ling Wang, Michael I., Jordan

TL;DR
This paper introduces probabilistic frameworks and two novel methods, Greedy Attack and Gumbel Attack, for generating adversarial examples on discrete data, demonstrating significant effectiveness on text classification models.
Contribution
It proposes a new probabilistic framework and two scalable attack methods for discrete data, with extensive evaluation on various text classification models.
Findings
Character-based CNN accuracy drops to random level with five character modifications
Methods outperform baseline attacks in effectiveness
Human evaluation confirms attack success
Abstract
We present a probabilistic framework for studying adversarial attacks on discrete data. Based on this framework, we derive a perturbation-based method, Greedy Attack, and a scalable learning-based method, Gumbel Attack, that illustrate various tradeoffs in the design of attacks. We demonstrate the effectiveness of these methods using both quantitative metrics and human evaluation on various state-of-the-art models for text classification, including a word-based CNN, a character-based CNN and an LSTM. As as example of our results, we show that the accuracy of character-based convolutional networks drops to the level of random selection by modifying only five characters through Greedy Attack.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Terrorism, Counterterrorism, and Political Violence
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
