Algorithmic Fairness Datasets: the Story so Far

Alessandro Fabris; Stefano Messina; Gianmaria Silvello; Gian Antonio; Susto

arXiv:2202.01711·cs.CY·September 27, 2022

Algorithmic Fairness Datasets: the Story so Far

Alessandro Fabris, Stefano Messina, Gianmaria Silvello, Gian Antonio, Susto

PDF

TL;DR

This paper surveys over two hundred datasets used in algorithmic fairness research, providing standardized documentation and analysis to address data documentation gaps and improve dataset understanding for fair machine learning.

Contribution

It offers a comprehensive, standardized documentation of key fairness datasets and analyzes their properties, limitations, and ethical considerations, unifying prior scholarship.

Findings

01

Identified the three most popular fairness datasets: Adult, COMPAS, and German Credit.

02

Provided detailed documentation and analysis of hundreds of datasets and their properties.

03

Highlighted best practices for dataset curation in fairness research.

Abstract

Data-driven algorithms are studied in diverse domains to support critical decisions, directly impacting people's well-being. As a result, a growing community of researchers has been investigating the equity of existing algorithms and proposing novel ones, advancing the understanding of risks and opportunities of automated decision-making for historically disadvantaged populations. Progress in fair Machine Learning hinges on data, which can be appropriately used only if adequately documented. Unfortunately, the algorithmic fairness community suffers from a collective data documentation debt caused by a lack of information on specific resources (opacity) and scatteredness of available information (sparsity). In this work, we target data documentation debt by surveying over two hundred datasets employed in algorithmic fairness research, and producing standardized and searchable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.