FairTabGen: High-Fidelity and Fair Synthetic Health Data Generation from Limited Samples

Nitish Nagesh; Salar Shakibhamedan; Mahdi Bagheri; Ziyu Wang; Nima TaheriNejad; Axel Jantsch; Amir M. Rahmani

arXiv:2508.11810·cs.LG·February 19, 2026

FairTabGen: High-Fidelity and Fair Synthetic Health Data Generation from Limited Samples

Nitish Nagesh, Salar Shakibhamedan, Mahdi Bagheri, Ziyu Wang, Nima TaheriNejad, Axel Jantsch, Amir M. Rahmani

PDF

Open Access

TL;DR

FairTabGen is a novel LLM-based framework that generates high-quality, fair synthetic healthcare data from limited samples, addressing privacy concerns and reducing computational requirements in clinical research.

Contribution

It introduces a new method combining in-context learning, prompt curation, and structural constraints for efficient, fair synthetic health data generation from small datasets.

Findings

01

Uses 99% less data than traditional methods

02

Achieves 50% improvement in fairness metrics

03

Enhances fairness by 10% with bias mitigation techniques

Abstract

Synthetic healthcare data generation offers a promising solution to research limitations in clinical settings caused by privacy and regulatory constraints. However, current synthetic data generation approaches require specialized knowledge about training generative models and require high computational resources. In this paper, we propose FairTabGen, an LLM-based tabular data generation framework that produces high-quality synthetic healthcare data using only a small subset of the original dataset. Our method combines in-context learning, prompt curation and embedding structural constraints for data synthesis. We evaluate performance on MIMIC-IV dataset. Our method using 99% less data and achieving 50% improvement for fairness through unawareness while maintaining competitive predictive utility. However, we observe data distribution of racial groups is skewed affecting demographic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEthics and Social Impacts of AI