More Data Can Lead Us Astray: Active Data Acquisition in the Presence of   Label Bias

Yunyi Li; Maria De-Arteaga; Maytal Saar-Tsechansky

arXiv:2207.07723·cs.LG·July 11, 2023

More Data Can Lead Us Astray: Active Data Acquisition in the Presence of Label Bias

Yunyi Li, Maria De-Arteaga, Maytal Saar-Tsechansky

PDF

Open Access

TL;DR

This paper investigates how active data collection strategies for fairness can inadvertently worsen bias if label bias is ignored, emphasizing the need to explicitly address label bias in fairness-aware learning.

Contribution

It provides a comprehensive overview of label bias types and empirically demonstrates the risks of neglecting label bias in active data acquisition for fairness.

Findings

01

Collecting more data without considering label bias can increase bias.

02

Fairness constraints based solely on observed labels may fail to mitigate bias.

03

Ignoring label bias during data collection can lead to unintended bias amplification.

Abstract

An increased awareness concerning risks of algorithmic bias has driven a surge of efforts around bias mitigation strategies. A vast majority of the proposed approaches fall under one of two categories: (1) imposing algorithmic fairness constraints on predictive models, and (2) collecting additional training samples. Most recently and at the intersection of these two categories, methods that propose active learning under fairness constraints have been developed. However, proposed bias mitigation strategies typically overlook the bias presented in the observed labels. In this work, we study fairness considerations of active data collection strategies in the presence of label bias. We first present an overview of different types of label bias in the context of supervised learning systems. We then empirically show that, when overlooking label bias, collecting more data can aggravate bias,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Machine Learning and Algorithms · Explainable Artificial Intelligence (XAI)