Introducing a Family of Synthetic Datasets for Research on Bias in   Machine Learning

William Blanzeisky; P\'adraig Cunningham; Kenneth Kennedy

arXiv:2107.08928·cs.LG·August 5, 2021·1 cites

Introducing a Family of Synthetic Datasets for Research on Bias in Machine Learning

William Blanzeisky, P\'adraig Cunningham, Kenneth Kennedy

PDF

Open Access 2 Repos

TL;DR

This paper introduces a new family of synthetic datasets designed to facilitate research on bias in machine learning, addressing data availability challenges by enabling controlled bias variation for experimental purposes.

Contribution

The paper presents a novel set of synthetic datasets with adjustable bias levels, providing a valuable resource for bias research in machine learning.

Findings

01

Datasets allow controlled bias variation.

02

Example experiment demonstrates dataset utility.

03

Facilitates bias analysis in ML models.

Abstract

A significant impediment to progress in research on bias in machine learning (ML) is the availability of relevant datasets. This situation is unlikely to change much given the sensitivity of such data. For this reason, there is a role for synthetic data in this research. In this short paper, we present one such family of synthetic data sets. We provide an overview of the data, describe how the level of bias can be varied, and present a simple example of an experiment on the data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Explainable Artificial Intelligence (XAI) · Machine Learning and Algorithms