Understanding Why Generalized Reweighting Does Not Improve Over ERM
Runtian Zhai, Chen Dan, Zico Kolter, Pradeep Ravikumar

TL;DR
This paper investigates why generalized reweighting methods do not outperform empirical risk minimization in handling distributional shifts, revealing that under certain conditions they produce similar models to ERM and are thus ineffective for robust generalization.
Contribution
The paper provides a theoretical analysis showing that broad classes of generalized reweighting algorithms yield models similar to ERM, explaining their limited effectiveness in distributional robustness.
Findings
GRW algorithms produce models close to ERM in overparameterized settings
Adding small regularization does not significantly improve robustness
GRW approaches are fundamentally limited in achieving distributionally robust generalization
Abstract
Empirical risk minimization (ERM) is known in practice to be non-robust to distributional shift where the training and the test distributions are different. A suite of approaches, such as importance weighting, and variants of distributionally robust optimization (DRO), have been proposed to solve this problem. But a line of recent work has empirically shown that these approaches do not significantly improve over ERM in real applications with distribution shift. The goal of this work is to obtain a comprehensive theoretical understanding of this intriguing phenomenon. We first posit the class of Generalized Reweighting (GRW) algorithms, as a broad category of approaches that iteratively update model parameters based on iterative reweighting of the training samples. We show that when overparameterized models are trained under GRW, the resulting models are close to that obtained by ERM. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMachine Learning and Algorithms · Machine Learning and Data Classification · Domain Adaptation and Few-Shot Learning
