Introducing a Family of Synthetic Datasets for Research on Bias in Machine Learning
William Blanzeisky, P\'adraig Cunningham, Kenneth Kennedy

TL;DR
This paper introduces a new family of synthetic datasets designed to facilitate research on bias in machine learning, addressing data availability challenges by enabling controlled bias variation for experimental purposes.
Contribution
The paper presents a novel set of synthetic datasets with adjustable bias levels, providing a valuable resource for bias research in machine learning.
Findings
Datasets allow controlled bias variation.
Example experiment demonstrates dataset utility.
Facilitates bias analysis in ML models.
Abstract
A significant impediment to progress in research on bias in machine learning (ML) is the availability of relevant datasets. This situation is unlikely to change much given the sensitivity of such data. For this reason, there is a role for synthetic data in this research. In this short paper, we present one such family of synthetic data sets. We provide an overview of the data, describe how the level of bias can be varied, and present a simple example of an experiment on the data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Explainable Artificial Intelligence (XAI) · Machine Learning and Algorithms
