Changing the Training Data Distribution to Reduce Simplicity Bias Improves In-distribution Generalization
Dang Nguyen, Paymon Haddad, Eric Gan, Baharan Mirzasoleiman

TL;DR
This paper explores how modifying training data distribution can reduce simplicity bias, leading to better in-distribution generalization, by comparing gradient descent and sharpness-aware minimization, and proposing a clustering-based upsampling method.
Contribution
It introduces a novel data reweighting technique that alleviates simplicity bias and improves generalization across various models and datasets.
Findings
SAM learns features more uniformly than GD in early training.
The proposed method improves generalization performance across multiple datasets.
Combining the method with existing strategies achieves state-of-the-art results.
Abstract
Can we modify the training data distribution to encourage the underlying optimization method toward finding solutions with superior generalization performance on in-distribution data? In this work, we approach this question for the first time by comparing the inductive bias of gradient descent (GD) with that of sharpness-aware minimization (SAM). By studying a two-layer CNN, we rigorously prove that SAM learns different features more uniformly, particularly in early epochs. That is, SAM is less susceptible to simplicity bias compared to GD. We also show that examples containing features that are learned early are separable from the rest based on the model's output. Based on this observation, we propose a method that (i) clusters examples based on the network output early in training, (ii) identifies a cluster of examples with similar network output, and (iii) upsamples the rest of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMusic and Audio Processing
MethodsSharpness-Aware Minimization · Segment Anything Model
