Adversarial Robustness May Be at Odds With Simplicity
Preetum Nakkiran

TL;DR
This paper demonstrates through theoretical examples that achieving adversarial robustness may require more complex classifiers than those used for standard accuracy, highlighting a potential structural tradeoff.
Contribution
The paper provides theoretical evidence that robust classification can demand exponentially more complex models than standard classification, revealing a structural tradeoff.
Findings
Simple classifiers can achieve high accuracy under random noise.
Any simple classifier is vulnerable to adversarial perturbations.
Robust classification may require exponentially more complex models.
Abstract
Current techniques in machine learning are so far are unable to learn classifiers that are robust to adversarial perturbations. However, they are able to learn non-robust classifiers with very high accuracy, even in the presence of random perturbations. Towards explaining this gap, we highlight the hypothesis that In this note, we show that this hypothesis is indeed possible, by giving several theoretical examples of classification tasks and sets of "simple" classifiers for which: (1) There exists a simple classifier with high standard accuracy, and also high accuracy under random noise. (2) Any simple classifier is not robust: it must have high adversarial loss with perturbations. (3) Robust classification is possible, but only with more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Machine Learning and Algorithms · Anomaly Detection Techniques and Applications
