Simplicity Bias Leads to Amplified Performance Disparities

Samuel J. Bell; Levent Sagun

arXiv:2212.06641·cs.LG·June 9, 2023

Simplicity Bias Leads to Amplified Performance Disparities

Samuel J. Bell, Levent Sagun

PDF

Open Access

TL;DR

This paper reveals that models trained with SGD tend to favor simpler data parts, leading to amplified performance disparities across groups even in balanced datasets, highlighting the need for model-aware fairness strategies.

Contribution

It introduces the concept of difficulty disparity, showing how model bias towards simplicity can worsen performance gaps, and provides a framework to quantify this amplification across models and datasets.

Findings

01

Difficulty disparity occurs even in balanced datasets.

02

Model choice amplifies existing performance disparities.

03

Real-world examples demonstrate increased group performance gaps.

Abstract

Which parts of a dataset will a given model find difficult? Recent work has shown that SGD-trained models have a bias towards simplicity, leading them to prioritize learning a majority class, or to rely upon harmful spurious correlations. Here, we show that the preference for "easy" runs far deeper: A model may prioritize any class or group of the dataset that it finds simple-at the expense of what it finds complex-as measured by performance difference on the test set. When subsets with different levels of complexity align with demographic groups, we term this difficulty disparity, a phenomenon that occurs even with balanced datasets that lack group/label associations. We show how difficulty disparity is a model-dependent quantity, and is further amplified in commonly-used models as selected by typical average performance scores. We quantify an amplification factor across a range of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Explainable Artificial Intelligence (XAI) · Machine Learning and Data Classification

MethodsALIGN · Test