Balanced Mixed-Type Tabular Data Synthesis with Diffusion Models

Zeyu Yang; Han Yu; Peikun Guo; Khadija Zanna; Xiaoxue Yang; Akane Sano

arXiv:2404.08254·cs.LG·March 5, 2025·1 cites

Balanced Mixed-Type Tabular Data Synthesis with Diffusion Models

Zeyu Yang, Han Yu, Peikun Guo, Khadija Zanna, Xiaoxue Yang, Akane Sano

PDF

Open Access 1 Repo

TL;DR

This paper presents a novel diffusion-based method for generating fair synthetic tabular data that reduces bias and improves fairness metrics while maintaining high data quality.

Contribution

We introduce a sensitive-guided diffusion model that enhances fairness in synthetic tabular data generation, outperforming existing methods on key fairness metrics.

Findings

01

Effectively mitigates bias in training data

02

Outperforms existing methods on fairness metrics

03

Maintains high quality of generated samples

Abstract

Diffusion models have emerged as a robust framework for various generative tasks, including tabular data synthesis. However, current tabular diffusion models tend to inherit bias in the training dataset and generate biased synthetic data, which may influence discriminatory actions. In this research, we introduce a novel tabular diffusion model that incorporates sensitive guidance to generate fair synthetic data with balanced joint distributions of the target label and sensitive attributes, such as sex and race. The empirical results demonstrate that our method effectively mitigates bias in training data while maintaining the quality of the generated samples. Furthermore, we provide evidence that our approach outperforms existing methods for synthesizing tabular data on fairness metrics such as demographic parity ratio and equalized odds ratio, achieving improvements of over $10%$ . Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

comp-well-org/fair-tab-diffusion
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Database Systems and Queries

MethodsDiffusion