Adversarial Perturbations Are Not So Weird: Entanglement of Robust and Non-Robust Features in Neural Network Classifiers
Jacob M. Springer, Melanie Mitchell, Garrett T. Kenyon

TL;DR
This paper investigates the entanglement of robust and non-robust features in neural networks, revealing how small, non-semantic patterns contribute to adversarial vulnerability and how robust classifiers can generate more transferable adversarial examples.
Contribution
It extends prior work by analyzing the nature of entangled features, showing non-robust features respond to small patterns entangled with robust ones, and demonstrates robust classifiers' effectiveness in generating transferable adversarial examples.
Findings
Non-robust features respond to small, entangled patterns.
Adversarial examples can be created by minimal perturbations to these patterns.
Robust classifiers are more effective in generating transferable adversarial examples.
Abstract
Neural networks trained on visual data are well-known to be vulnerable to often imperceptible adversarial perturbations. The reasons for this vulnerability are still being debated in the literature. Recently Ilyas et al. (2019) showed that this vulnerability arises, in part, because neural network classifiers rely on highly predictive but brittle "non-robust" features. In this paper we extend the work of Ilyas et al. by investigating the nature of the input patterns that give rise to these features. In particular, we hypothesize that in a neural network trained in a standard way, non-robust features respond to small, "non-semantic" patterns that are typically entangled with larger, robust patterns, known to be more human-interpretable, as opposed to solely responding to statistical artifacts in a dataset. Thus, adversarial examples can be formed via minimal perturbations to these small,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
