Evading the Simplicity Bias: Training a Diverse Set of Models Discovers   Solutions with Superior OOD Generalization

Damien Teney; Ehsan Abbasnejad; Simon Lucey; Anton van den Hengel

arXiv:2105.05612·cs.LG·September 13, 2022

Evading the Simplicity Bias: Training a Diverse Set of Models Discovers Solutions with Superior OOD Generalization

Damien Teney, Ehsan Abbasnejad, Simon Lucey, Anton van den Hengel

PDF

Open Access 1 Repo

TL;DR

This paper introduces a method to mitigate the simplicity bias in neural networks by training diverse models with gradient alignment penalties, leading to improved out-of-distribution generalization and state-of-the-art results in biased visual recognition tasks.

Contribution

It proposes a novel approach to reduce simplicity bias by training multiple models with gradient alignment penalties, enhancing OOD generalization without requiring multiple training environments.

Findings

01

Improved OOD generalization in visual recognition tasks.

02

Achieved state-of-the-art results on biased datasets.

03

Demonstrated theoretical and empirical benefits of diverse model training.

Abstract

Neural networks trained with SGD were recently shown to rely preferentially on linearly-predictive features and can ignore complex, equally-predictive ones. This simplicity bias can explain their lack of robustness out of distribution (OOD). The more complex the task to learn, the more likely it is that statistical artifacts (i.e. selection biases, spurious correlations) are simpler than the mechanisms to learn. We demonstrate that the simplicity bias can be mitigated and OOD generalization improved. We train a set of similar models to fit the data in different ways using a penalty on the alignment of their input gradients. We show theoretically and empirically that this induces the learning of more complex predictive patterns. OOD generalization fundamentally requires information beyond i.i.d. examples, such as multiple training environments, counterfactual examples, or other side…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dteney/collages-dataset
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · AI in cancer detection · Digital Imaging for Blood Diseases

MethodsStochastic Gradient Descent