Lazy Data Practices Harm Fairness Research

Jan Simson; Alessandro Fabris; Christoph Kern

arXiv:2404.17293·cs.LG·June 21, 2024

Lazy Data Practices Harm Fairness Research

Jan Simson, Alessandro Fabris, Christoph Kern

PDF

1 Repo

TL;DR

This paper critically examines how common, unreflective data practices in fair ML hinder research reliability and fairness, highlighting issues of representation, minority exclusion, and opaque data handling, and proposes recommendations for responsible data use.

Contribution

It provides a systematic analysis of dataset usage in fair ML, identifying key shortcomings and offering guidelines to improve transparency and inclusivity in data practices.

Findings

01

Protected attribute representation is often lacking.

02

Minorities are frequently excluded during preprocessing.

03

Opaque data handling threatens fairness research generalization.

Abstract

Data practices shape research and practice on fairness in machine learning (fair ML). Critical data studies offer important reflections and critiques for the responsible advancement of the field by highlighting shortcomings and proposing recommendations for improvement. In this work, we present a comprehensive analysis of fair ML datasets, demonstrating how unreflective yet common practices hinder the reach and reliability of algorithmic fairness findings. We systematically study protected information encoded in tabular datasets and their usage in 280 experiments across 142 publications. Our analyses identify three main areas of concern: (1) a \textbf{lack of representation for certain protected attributes} in both data and evaluations; (2) the widespread \textbf{exclusion of minorities} during data preprocessing; and (3) \textbf{opaque data processing} threatening the generalization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

reliable-ai/lazy-data-practices
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSparse Evolutionary Training