Quantifying and Understanding Adversarial Examples in Discrete Input   Spaces

Volodymyr Kuleshov; Evgenii Nikishin; Shantanu Thakoor; Tingfung Lau,; Stefano Ermon

arXiv:2112.06276·cs.LG·December 14, 2021

Quantifying and Understanding Adversarial Examples in Discrete Input Spaces

Volodymyr Kuleshov, Evgenii Nikishin, Shantanu Thakoor, Tingfung Lau,, Stefano Ermon

PDF

Open Access

TL;DR

This paper introduces a domain-agnostic framework for understanding and generating adversarial examples in discrete input spaces, such as text and biological data, highlighting their prevalence and underlying causes.

Contribution

It formalizes the concept of synonymous adversarial examples in discrete domains and presents a simple, universal algorithm to generate them across various applications.

Findings

01

The algorithm effectively uncovers adversarial examples in multiple domains.

02

Adversarial examples are linked to spurious token correlations.

03

Discrete adversarial examples are prevalent across different tasks.

Abstract

Modern classification algorithms are susceptible to adversarial examples--perturbations to inputs that cause the algorithm to produce undesirable behavior. In this work, we seek to understand and extend adversarial examples across domains in which inputs are discrete, particularly across new domains, such as computational biology. As a step towards this goal, we formalize a notion of synonymous adversarial examples that applies in any discrete setting and describe a simple domain-agnostic algorithm to construct such examples. We apply this algorithm across multiple domains--including sentiment analysis and DNA sequence classification--and find that it consistently uncovers adversarial examples. We seek to understand their prevalence theoretically and we attribute their existence to spurious token correlations, a statistical phenomenon that is specific to discrete spaces. Our work is a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning