Quantifying and Understanding Adversarial Examples in Discrete Input Spaces
Volodymyr Kuleshov, Evgenii Nikishin, Shantanu Thakoor, Tingfung Lau,, Stefano Ermon

TL;DR
This paper introduces a domain-agnostic framework for understanding and generating adversarial examples in discrete input spaces, such as text and biological data, highlighting their prevalence and underlying causes.
Contribution
It formalizes the concept of synonymous adversarial examples in discrete domains and presents a simple, universal algorithm to generate them across various applications.
Findings
The algorithm effectively uncovers adversarial examples in multiple domains.
Adversarial examples are linked to spurious token correlations.
Discrete adversarial examples are prevalent across different tasks.
Abstract
Modern classification algorithms are susceptible to adversarial examples--perturbations to inputs that cause the algorithm to produce undesirable behavior. In this work, we seek to understand and extend adversarial examples across domains in which inputs are discrete, particularly across new domains, such as computational biology. As a step towards this goal, we formalize a notion of synonymous adversarial examples that applies in any discrete setting and describe a simple domain-agnostic algorithm to construct such examples. We apply this algorithm across multiple domains--including sentiment analysis and DNA sequence classification--and find that it consistently uncovers adversarial examples. We seek to understand their prevalence theoretically and we attribute their existence to spurious token correlations, a statistical phenomenon that is specific to discrete spaces. Our work is a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
