Adversarial Examples Are Not Real Features
Ang Li, Yifei Wang, Yiwen Guo, Yisen Wang

TL;DR
This paper challenges the idea that adversarial examples are based on useful non-robust features, showing they are more like shortcuts that do not transfer well across different learning paradigms and do not ensure robustness.
Contribution
The study re-examines the role of non-robust features across multiple learning paradigms, revealing their limited usefulness and their nature as paradigm-specific shortcuts.
Findings
Non-robust features transfer poorly across different paradigms.
Naturally trained robust features are non-robust under AutoAttack.
Non-robust features are more like shortcuts than genuinely useful features.
Abstract
The existence of adversarial examples has been a mystery for years and attracted much interest. A well-known theory by \citet{ilyas2019adversarial} explains adversarial vulnerability from a data perspective by showing that one can extract non-robust features from adversarial examples and these features alone are useful for classification. However, the explanation remains quite counter-intuitive since non-robust features are mostly noise features to humans. In this paper, we re-examine the theory from a larger context by incorporating multiple learning paradigms. Notably, we find that contrary to their good usefulness under supervised learning, non-robust features attain poor usefulness when transferred to other self-supervised learning paradigms, such as contrastive learning, masked image modeling, and diffusion models. It reveals that non-robust features are not really as useful as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
MethodsDiffusion
