Identifying and Correcting Label Bias in Machine Learning

Heinrich Jiang; Ofir Nachum

arXiv:1901.04966·cs.LG·January 16, 2019·116 cites

Identifying and Correcting Label Bias in Machine Learning

Heinrich Jiang, Ofir Nachum

PDF

Open Access

TL;DR

This paper presents a mathematical approach to identify and correct label bias in datasets, enabling the training of fair classifiers through re-weighting data points without altering labels.

Contribution

It introduces a theoretical framework for bias correction via re-weighting, with guarantees that this recovers unbiased labels and improves fairness in classifiers.

Findings

01

Re-weighting data points can correct label bias without changing labels.

02

The method is fast, robust, and compatible with various learning algorithms.

03

Outperforms standard fairness approaches on multiple datasets.

Abstract

Datasets often contain biases which unfairly disadvantage certain groups, and classifiers trained on such datasets can inherit these biases. In this paper, we provide a mathematical formulation of how this bias can arise. We do so by assuming the existence of underlying, unknown, and unbiased labels which are overwritten by an agent who intends to provide accurate labels but may have biases against certain groups. Despite the fact that we only observe the biased labels, we are able to show that the bias may nevertheless be corrected by re-weighting the data points without changing the labels. We show, with theoretical guarantees, that training on the re-weighted dataset corresponds to training on the unobserved but unbiased labels, thus leading to an unbiased machine learning classifier. Our procedure is fast and robust and can be used with virtually any learning algorithm. We evaluate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEthics and Social Impacts of AI · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)