Data Augmentation via Diffusion Model to Enhance AI Fairness
Christina Hastings Blow, Lijun Qian, Camille Gibson, Pamela Obiomon,, Xishuang Dong

TL;DR
This paper investigates using diffusion models, specifically Tab-DDPM, to generate synthetic tabular data for data augmentation, aiming to improve fairness in AI binary classification tasks.
Contribution
It introduces a novel application of diffusion models for synthetic tabular data generation to enhance AI fairness, combined with reweighting techniques for better results.
Findings
Synthetic data from Tab-DDPM improves fairness metrics
Reweighting samples further enhances fairness outcomes
Multiple ML models benefit from augmented data in fairness improvements
Abstract
AI fairness seeks to improve the transparency and explainability of AI systems by ensuring that their outcomes genuinely reflect the best interests of users. Data augmentation, which involves generating synthetic data from existing datasets, has gained significant attention as a solution to data scarcity. In particular, diffusion models have become a powerful technique for generating synthetic data, especially in fields like computer vision. This paper explores the potential of diffusion models to generate synthetic tabular data to improve AI fairness. The Tabular Denoising Diffusion Probabilistic Model (Tab-DDPM), a diffusion model adaptable to any tabular dataset and capable of handling various feature types, was utilized with different amounts of generated data for data augmentation. Additionally, reweighting samples from AIF360 was employed to further enhance AI fairness. Five…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTraffic Prediction and Management Techniques · Big Data Technologies and Applications · Medical Imaging and Analysis
MethodsSoftmax · Attention Is All You Need · Logistic Regression · Diffusion
