Evading the Simplicity Bias: Training a Diverse Set of Models Discovers Solutions with Superior OOD Generalization
Damien Teney, Ehsan Abbasnejad, Simon Lucey, Anton van den Hengel

TL;DR
This paper introduces a method to mitigate the simplicity bias in neural networks by training diverse models with gradient alignment penalties, leading to improved out-of-distribution generalization and state-of-the-art results in biased visual recognition tasks.
Contribution
It proposes a novel approach to reduce simplicity bias by training multiple models with gradient alignment penalties, enhancing OOD generalization without requiring multiple training environments.
Findings
Improved OOD generalization in visual recognition tasks.
Achieved state-of-the-art results on biased datasets.
Demonstrated theoretical and empirical benefits of diverse model training.
Abstract
Neural networks trained with SGD were recently shown to rely preferentially on linearly-predictive features and can ignore complex, equally-predictive ones. This simplicity bias can explain their lack of robustness out of distribution (OOD). The more complex the task to learn, the more likely it is that statistical artifacts (i.e. selection biases, spurious correlations) are simpler than the mechanisms to learn. We demonstrate that the simplicity bias can be mitigated and OOD generalization improved. We train a set of similar models to fit the data in different ways using a penalty on the alignment of their input gradients. We show theoretically and empirically that this induces the learning of more complex predictive patterns. OOD generalization fundamentally requires information beyond i.i.d. examples, such as multiple training environments, counterfactual examples, or other side…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · AI in cancer detection · Digital Imaging for Blood Diseases
MethodsStochastic Gradient Descent
