Shedding light on underrepresentation and Sampling Bias in machine   learning

Sami Zhioua; R\=uta Binkyt\.e

arXiv:2306.05068·cs.LG·June 9, 2023·2 cites

Shedding light on underrepresentation and Sampling Bias in machine learning

Sami Zhioua, R\=uta Binkyt\.e

PDF

Open Access

TL;DR

This paper clarifies different types of sampling bias in machine learning, analyzes their impact on fairness, and questions the effectiveness of simply collecting more data from underrepresented groups to reduce discrimination.

Contribution

It introduces clear definitions for sample size bias and underrepresentation bias, and analyzes how these biases affect fairness and model discrimination.

Findings

01

Bias can be decomposed into variance, bias, and noise.

02

Sampling bias affects fairness differently across groups.

03

Collecting more data from underrepresented groups may not always mitigate discrimination.

Abstract

Accurately measuring discrimination is crucial to faithfully assessing fairness of trained machine learning (ML) models. Any bias in measuring discrimination leads to either amplification or underestimation of the existing disparity. Several sources of bias exist and it is assumed that bias resulting from machine learning is born equally by different groups (e.g. females vs males, whites vs blacks, etc.). If, however, bias is born differently by different groups, it may exacerbate discrimination against specific sub-populations. Sampling bias, is inconsistently used in the literature to describe bias due to the sampling procedure. In this paper, we attempt to disambiguate this term by introducing clearly defined variants of sampling bias, namely, sample size bias (SSB) and underrepresentation bias (URB). We show also how discrimination can be decomposed into variance, bias, and noise.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Ethics and Social Impacts of AI