Fundamental Tradeoffs between Invariance and Sensitivity to Adversarial Perturbations
Florian Tram\`er, Jens Behrmann, Nicholas Carlini, Nicolas, Papernot, J\"orn-Henrik Jacobsen

TL;DR
This paper explores the fundamental tradeoffs between model sensitivity and invariance to adversarial perturbations, revealing that defenses against one type can weaken resistance to the other and highlighting the need for new robust approaches.
Contribution
It introduces the concept of invariance-based adversarial examples, demonstrating their existence and impact, and shows how current defenses can be compromised by these attacks.
Findings
State-of-the-art models can be broken by small invariance-based perturbations.
Defenses against sensitivity attacks can reduce robustness to invariance attacks.
Overly invariant classifiers stem from overly-robust features in datasets.
Abstract
Adversarial examples are malicious inputs crafted to induce misclassification. Commonly studied sensitivity-based adversarial examples introduce semantically-small changes to an input that result in a different model prediction. This paper studies a complementary failure mode, invariance-based adversarial examples, that introduce minimal semantic changes that modify an input's true label yet preserve the model's prediction. We demonstrate fundamental tradeoffs between these two types of adversarial examples. We show that defenses against sensitivity-based attacks actively harm a model's accuracy on invariance-based attacks, and that new approaches are needed to resist both attack types. In particular, we break state-of-the-art adversarially-trained and certifiably-robust models by generating small perturbations that the models are (provably) robust to, yet that change an input's class…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning
