A survey on bias in machine learning research
Agnieszka Miko{\l}ajczyk-Bare{\l}a, Micha{\l} Grochowski

TL;DR
This survey reviews potential sources of bias in machine learning pipelines, emphasizing the importance of understanding bias origins to develop better detection and mitigation methods for fairer, more transparent models.
Contribution
It provides a comprehensive taxonomy of bias sources in ML pipelines, bridging gaps in existing literature and offering clear examples for each source.
Findings
Identifies over forty potential bias sources in ML pipelines
Highlights the importance of understanding bias origins for mitigation
Provides a taxonomy and examples of bias sources
Abstract
Current research on bias in machine learning often focuses on fairness, while overlooking the roots or causes of bias. However, bias was originally defined as a "systematic error," often caused by humans at different stages of the research process. This article aims to bridge the gap between past literature on bias in research by providing taxonomy for potential sources of bias and errors in data and models. The paper focus on bias in machine learning pipelines. Survey analyses over forty potential sources of bias in the machine learning (ML) pipeline, providing clear examples for each. By understanding the sources and consequences of bias in machine learning, better methods can be developed for its detecting and mitigating, leading to fairer, more transparent, and more accurate ML models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Imbalanced Data Classification Techniques · Machine Learning and Algorithms
MethodsFocus
