Representative & Fair Synthetic Data

Paul Tiwald; Alexandra Ebert; Daniel T. Soukup

arXiv:2104.03007·cs.LG·April 8, 2021·1 cites

Representative & Fair Synthetic Data

Paul Tiwald, Alexandra Ebert, Daniel T. Soukup

PDF

Open Access

TL;DR

This paper introduces a framework for generating synthetic data that is both representative and fair, aiming to reduce societal biases in AI training data while preserving data utility.

Contribution

It proposes a novel method to incorporate fairness constraints into self-supervised generative models, enabling the creation of unbiased synthetic datasets.

Findings

01

Successfully generated fair synthetic data for the UCI Adult census dataset.

02

Biases in gender and race are controlled while maintaining data relationships.

03

Downstream models trained on synthetic data show reduced bias compared to original data.

Abstract

Algorithms learn rules and associations based on the training data that they are exposed to. Yet, the very same data that teaches machines to understand and predict the world, contains societal and historic biases, resulting in biased algorithms with the risk of further amplifying these once put into use for decision support. Synthetic data, on the other hand, emerges with the promise to provide an unlimited amount of representative, realistic training samples, that can be shared further without disclosing the privacy of individual subjects. We present a framework to incorporate fairness constraints into the self-supervised learning process, that allows to then simulate an unlimited amount of representative as well as fair synthetic data. This framework provides a handle to govern and control for privacy as well as for bias within AI at its very source: the training data. We demonstrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI)