Adversarial Training Can Hurt Generalization
Aditi Raghunathan, Sang Michael Xie, Fanny Yang, John C. Duchi, Percy, Liang

TL;DR
This paper demonstrates that adversarial training can impair standard generalization even with sufficient data, revealing a fundamental robustness-generalization tradeoff that can be mitigated using unlabeled data through robust self-training.
Contribution
It shows that the robustness-generalization tradeoff exists even with optimal predictors and finite data, and introduces robust self-training as a method to reduce this tradeoff.
Findings
Adversarial training can hurt standard accuracy even with optimal predictors.
The tradeoff persists in convex learning settings with finite data.
Robust self-training with unlabeled data alleviates the tradeoff.
Abstract
While adversarial training can improve robust accuracy (against an adversary), it sometimes hurts standard accuracy (when there is no adversary). Previous work has studied this tradeoff between standard and robust accuracy, but only in the setting where no predictor performs well on both objectives in the infinite data limit. In this paper, we show that even when the optimal predictor with infinite data performs well on both objectives, a tradeoff can still manifest itself with finite data. Furthermore, since our construction is based on a convex learning problem, we rule out optimization concerns, thus laying bare a fundamental tension between robustness and generalization. Finally, we show that robust self-training mostly eliminates this tradeoff by leveraging unlabeled data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning · Machine Learning and Algorithms
