Distributionally Robust Neural Networks for Group Shifts: On the   Importance of Regularization for Worst-Case Generalization

Shiori Sagawa; Pang Wei Koh; Tatsunori B. Hashimoto; Percy Liang

arXiv:1911.08731·cs.LG·April 3, 2020·364 cites

Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization

Shiori Sagawa, Pang Wei Koh, Tatsunori B. Hashimoto, Percy Liang

PDF

Open Access 5 Repos 2 Datasets

TL;DR

This paper demonstrates that regularization significantly improves worst-group generalization in overparameterized neural networks trained with distributionally robust optimization, especially on tasks with group shifts.

Contribution

It shows that coupling group DRO with increased regularization enhances worst-case group performance, and introduces a stochastic optimization algorithm with convergence guarantees.

Findings

01

Regularization improves worst-group accuracy by 10-40 percentage points.

02

Naive group DRO fails to improve worst-case performance without regularization.

03

The proposed stochastic optimization algorithm efficiently trains robust models.

Abstract

Overparameterized neural networks can be highly accurate on average on an i.i.d. test set yet consistently fail on atypical groups of the data (e.g., by learning spurious correlations that hold on average but not in such groups). Distributionally robust optimization (DRO) allows us to learn models that instead minimize the worst-case training loss over a set of pre-defined groups. However, we find that naively applying group DRO to overparameterized neural networks fails: these models can perfectly fit the training data, and any model with vanishing average training loss also already has vanishing worst-case training loss. Instead, the poor worst-case performance arises from poor generalization on some groups. By coupling group DRO models with increased regularization---a stronger-than-typical L2 penalty or early stopping---we achieve substantially higher worst-group accuracies, with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and ELM · Stochastic Gradient Optimization Techniques

MethodsTest