Robustness May Be at Odds with Accuracy
Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, Alexander Turner,, Aleksander Madry

TL;DR
This paper demonstrates a fundamental trade-off between adversarial robustness and standard accuracy, showing that robust models often learn different features that may reduce accuracy but align better with human perception.
Contribution
It proves the existence of a robustness-accuracy trade-off in simple settings and links this to different feature representations learned by robust classifiers.
Findings
Robust models may have lower standard accuracy.
Robust classifiers learn different, more human-aligned features.
Trade-off is provably demonstrated in simple models.
Abstract
We show that there may exist an inherent tension between the goal of adversarial robustness and that of standard generalization. Specifically, training robust models may not only be more resource-consuming, but also lead to a reduction of standard accuracy. We demonstrate that this trade-off between the standard accuracy of a model and its robustness to adversarial perturbations provably exists in a fairly simple and natural setting. These findings also corroborate a similar phenomenon observed empirically in more complex settings. Further, we argue that this phenomenon is a consequence of robust classifiers learning fundamentally different feature representations than standard classifiers. These differences, in particular, seem to result in unexpected benefits: the representations learned by robust models tend to align better with salient data characteristics and human perception.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Domain Adaptation and Few-Shot Learning
