FairCauseSyn: Towards Causally Fair LLM-Augmented Synthetic Data Generation
Nitish Nagesh, Ziyu Wang, Amir M. Rahmani

TL;DR
This paper introduces FairCauseSyn, a novel LLM-augmented method for generating synthetic health data that maintains causal fairness, reducing bias and improving equitable health research.
Contribution
It is the first to incorporate causal fairness into LLM-based synthetic health data generation, addressing a key gap in existing methods.
Findings
Synthetic data deviates less than 10% from real data on fairness metrics.
Training on causally fair predictors reduces bias by 70%.
Enhances access to fair synthetic health data.
Abstract
Synthetic data generation creates data based on real-world data using generative models. In health applications, generating high-quality data while maintaining fairness for sensitive attributes is essential for equitable outcomes. Existing GAN-based and LLM-based methods focus on counterfactual fairness and are primarily applied in finance and legal domains. Causal fairness provides a more comprehensive evaluation framework by preserving causal structure, but current synthetic data generation methods do not address it in health settings. To fill this gap, we develop the first LLM-augmented synthetic data generation method to enhance causal fairness using real-world tabular health data. Our generated data deviates by less than 10% from real data on causal fairness metrics. When trained on causally fair predictors, synthetic data reduces bias on the sensitive attribute by 70% compared to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Adversarial Robustness in Machine Learning
