Distributionally Robust Losses for Latent Covariate Mixtures

John Duchi; Tatsunori Hashimoto; Hongseok Namkoong

arXiv:2007.13982·cs.LG·August 12, 2022·6 cites

Distributionally Robust Losses for Latent Covariate Mixtures

John Duchi, Tatsunori Hashimoto, Hongseok Namkoong

PDF

Open Access 1 Repo

TL;DR

This paper introduces a convex optimization method to improve model performance across all subpopulations in heterogeneous datasets, ensuring fairness and robustness with theoretical guarantees and empirical validation.

Contribution

It proposes a novel convex procedure that controls worst-case subpopulation loss with finite-sample guarantees, addressing fairness in heterogeneous data.

Findings

01

The method achieves low worst-case loss on unseen subpopulations.

02

Finite-sample convergence guarantees are established.

03

Empirical results show improved robustness across multiple tasks.

Abstract

While modern large-scale datasets often consist of heterogeneous subpopulations -- for example, multiple demographic groups or multiple text corpora -- the standard practice of minimizing average loss fails to guarantee uniformly low losses across all subpopulations. We propose a convex procedure that controls the worst-case performance over all subpopulations of a given size. Our procedure comes with finite-sample (nonparametric) convergence guarantees on the worst-off subpopulation. Empirically, we observe on lexical similarity, wine quality, and recidivism prediction tasks that our worst-case procedure learns models that do well against unseen subpopulations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hsnamkoong/marginal-dro
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies