Overfitting in adversarially robust deep learning
Leslie Rice, Eric Wong, J. Zico Kolter

TL;DR
This paper empirically demonstrates that overfitting significantly harms adversarial robustness in deep learning, and that early stopping effectively mitigates this issue across multiple datasets and models.
Contribution
It reveals the detrimental effect of overfitting on adversarial robustness and shows early stopping as a simple yet effective solution, contrasting prior beliefs about overparameterization.
Findings
Overfitting reduces adversarial robustness across datasets.
Early stopping matches the performance gains of complex algorithms.
Classical remedies like regularization do not outperform early stopping.
Abstract
It is common practice in deep learning to use overparameterized networks and train for as long as possible; there are numerous studies that show, both theoretically and empirically, that such practices surprisingly do not unduly harm the generalization performance of the classifier. In this paper, we empirically study this phenomenon in the setting of adversarially trained deep networks, which are trained to minimize the loss under worst-case adversarial perturbations. We find that overfitting to the training set does in fact harm robust performance to a very large degree in adversarially robust training across multiple datasets (SVHN, CIFAR-10, CIFAR-100, and ImageNet) and perturbation models ( and ). Based upon this observed effect, we show that the performance gains of virtually all recent algorithmic improvements upon adversarial training can be matched by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Domain Adaptation and Few-Shot Learning
