TL;DR
This survey reviews real-world tabular datasets used in fairness-aware machine learning, analyzing their attributes and biases to support empirical evaluation of fairness interventions.
Contribution
It provides a comprehensive overview of datasets, explores attribute relationships, and investigates biases to aid fair ML research and benchmarking.
Findings
Identifies relationships between dataset attributes using Bayesian networks
Analyzes bias and attribute interactions through exploratory analysis
Highlights the importance of diverse datasets for fairness evaluation
Abstract
As decision-making increasingly relies on Machine Learning (ML) and (big) data, the issue of fairness in data-driven Artificial Intelligence (AI) systems is receiving increasing attention from both research and industry. A large variety of fairness-aware machine learning solutions have been proposed which involve fairness-related interventions in the data, learning algorithms and/or model outputs. However, a vital part of proposing new approaches is evaluating them empirically on benchmark datasets that represent realistic and diverse settings. Therefore, in this paper, we overview real-world datasets used for fairness-aware machine learning. We focus on tabular data as the most common data representation for fairness-aware machine learning. We start our analysis by identifying relationships between the different attributes, particularly w.r.t. protected attributes and class attribute,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
