A survey on bias in machine learning research

Agnieszka Miko{\l}ajczyk-Bare{\l}a; Micha{\l} Grochowski

arXiv:2308.11254·cs.LG·August 23, 2023

A survey on bias in machine learning research

Agnieszka Miko{\l}ajczyk-Bare{\l}a, Micha{\l} Grochowski

PDF

Open Access

TL;DR

This survey reviews potential sources of bias in machine learning pipelines, emphasizing the importance of understanding bias origins to develop better detection and mitigation methods for fairer, more transparent models.

Contribution

It provides a comprehensive taxonomy of bias sources in ML pipelines, bridging gaps in existing literature and offering clear examples for each source.

Findings

01

Identifies over forty potential bias sources in ML pipelines

02

Highlights the importance of understanding bias origins for mitigation

03

Provides a taxonomy and examples of bias sources

Abstract

Current research on bias in machine learning often focuses on fairness, while overlooking the roots or causes of bias. However, bias was originally defined as a "systematic error," often caused by humans at different stages of the research process. This article aims to bridge the gap between past literature on bias in research by providing taxonomy for potential sources of bias and errors in data and models. The paper focus on bias in machine learning pipelines. Survey analyses over forty potential sources of bias in the machine learning (ML) pipeline, providing clear examples for each. By understanding the sources and consequences of bias in machine learning, better methods can be developed for its detecting and mitigating, leading to fairer, more transparent, and more accurate ML models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Imbalanced Data Classification Techniques · Machine Learning and Algorithms

MethodsFocus