Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization
Shiori Sagawa, Pang Wei Koh, Tatsunori B. Hashimoto, Percy Liang

TL;DR
This paper demonstrates that regularization significantly improves worst-group generalization in overparameterized neural networks trained with distributionally robust optimization, especially on tasks with group shifts.
Contribution
It shows that coupling group DRO with increased regularization enhances worst-case group performance, and introduces a stochastic optimization algorithm with convergence guarantees.
Findings
Regularization improves worst-group accuracy by 10-40 percentage points.
Naive group DRO fails to improve worst-case performance without regularization.
The proposed stochastic optimization algorithm efficiently trains robust models.
Abstract
Overparameterized neural networks can be highly accurate on average on an i.i.d. test set yet consistently fail on atypical groups of the data (e.g., by learning spurious correlations that hold on average but not in such groups). Distributionally robust optimization (DRO) allows us to learn models that instead minimize the worst-case training loss over a set of pre-defined groups. However, we find that naively applying group DRO to overparameterized neural networks fails: these models can perfectly fit the training data, and any model with vanishing average training loss also already has vanishing worst-case training loss. Instead, the poor worst-case performance arises from poor generalization on some groups. By coupling group DRO models with increased regularization---a stronger-than-typical L2 penalty or early stopping---we achieve substantially higher worst-group accuracies, with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and ELM · Stochastic Gradient Optimization Techniques
MethodsTest
