Distributionally Robust Losses for Latent Covariate Mixtures
John Duchi, Tatsunori Hashimoto, Hongseok Namkoong

TL;DR
This paper introduces a convex optimization method to improve model performance across all subpopulations in heterogeneous datasets, ensuring fairness and robustness with theoretical guarantees and empirical validation.
Contribution
It proposes a novel convex procedure that controls worst-case subpopulation loss with finite-sample guarantees, addressing fairness in heterogeneous data.
Findings
The method achieves low worst-case loss on unseen subpopulations.
Finite-sample convergence guarantees are established.
Empirical results show improved robustness across multiple tasks.
Abstract
While modern large-scale datasets often consist of heterogeneous subpopulations -- for example, multiple demographic groups or multiple text corpora -- the standard practice of minimizing average loss fails to guarantee uniformly low losses across all subpopulations. We propose a convex procedure that controls the worst-case performance over all subpopulations of a given size. Our procedure comes with finite-sample (nonparametric) convergence guarantees on the worst-off subpopulation. Empirically, we observe on lexical similarity, wine quality, and recidivism prediction tasks that our worst-case procedure learns models that do well against unseen subpopulations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies
