Changing the Training Data Distribution to Reduce Simplicity Bias Improves In-distribution Generalization

Dang Nguyen; Paymon Haddad; Eric Gan; Baharan Mirzasoleiman

arXiv:2404.17768·cs.LG·March 3, 2026·1 cites

Changing the Training Data Distribution to Reduce Simplicity Bias Improves In-distribution Generalization

Dang Nguyen, Paymon Haddad, Eric Gan, Baharan Mirzasoleiman

PDF

Open Access 1 Video

TL;DR

This paper explores how modifying training data distribution can reduce simplicity bias, leading to better in-distribution generalization, by comparing gradient descent and sharpness-aware minimization, and proposing a clustering-based upsampling method.

Contribution

It introduces a novel data reweighting technique that alleviates simplicity bias and improves generalization across various models and datasets.

Findings

01

SAM learns features more uniformly than GD in early training.

02

The proposed method improves generalization performance across multiple datasets.

03

Combining the method with existing strategies achieves state-of-the-art results.

Abstract

Can we modify the training data distribution to encourage the underlying optimization method toward finding solutions with superior generalization performance on in-distribution data? In this work, we approach this question for the first time by comparing the inductive bias of gradient descent (GD) with that of sharpness-aware minimization (SAM). By studying a two-layer CNN, we rigorously prove that SAM learns different features more uniformly, particularly in early epochs. That is, SAM is less susceptible to simplicity bias compared to GD. We also show that examples containing features that are learned early are separable from the rest based on the model's output. Based on this observation, we propose a method that (i) clusters examples based on the network output early in training, (ii) identifies a cluster of examples with similar network output, and (iii) upsamples the rest of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Changing the Training Data Distribution to Reduce Simplicity Bias Improves In-distribution Generalization· slideslive

Taxonomy

TopicsMusic and Audio Processing

MethodsSharpness-Aware Minimization · Segment Anything Model