DECAF: Generating Fair Synthetic Data Using Causally-Aware Generative   Networks

Boris van Breugel; Trent Kyono; Jeroen Berrevoets; Mihaela van der; Schaar

arXiv:2110.12884·cs.LG·November 8, 2021·26 cites

DECAF: Generating Fair Synthetic Data Using Causally-Aware Generative Networks

Boris van Breugel, Trent Kyono, Jeroen Berrevoets, Mihaela van der, Schaar

PDF

Open Access 1 Repo 1 Video

TL;DR

DECAF is a causally-aware GAN framework that generates fair synthetic tabular data by embedding a structural causal model, enabling bias removal and ensuring fairness in downstream machine learning tasks.

Contribution

We introduce DECAF, a novel GAN-based method that incorporates causal structure to produce fair synthetic data with theoretical guarantees on fairness and convergence.

Findings

01

Successfully removes biases in synthetic data

02

Generates high-quality fair data compatible with multiple fairness definitions

03

Provides theoretical guarantees on fairness and convergence

Abstract

Machine learning models have been criticized for reflecting unfair biases in the training data. Instead of solving for this by introducing fair learning algorithms directly, we focus on generating fair synthetic data, such that any downstream learner is fair. Generating fair synthetic data from unfair data - while remaining truthful to the underlying data-generating process (DGP) - is non-trivial. In this paper, we introduce DECAF: a GAN-based fair synthetic data generator for tabular data. With DECAF we embed the DGP explicitly as a structural causal model in the input layers of the generator, allowing each variable to be reconstructed conditioned on its causal parents. This procedure enables inference time debiasing, where biased edges can be strategically removed for satisfying user-defined fairness requirements. The DECAF framework is versatile and compatible with several popular…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vanderschaarlab/decaf
pytorchOfficial

Videos

DECAF: Generating Fair Synthetic Data Using Causally-Aware Generative Networks· slideslive

Taxonomy

TopicsEthics and Social Impacts of AI · Privacy-Preserving Technologies in Data · Adversarial Robustness in Machine Learning