Downstream Fairness Caveats with Synthetic Healthcare Data
Karan Bhanot, Ioana Baldini, Dennis Wei, Jiaming Zeng, Kristin P., Bennett

TL;DR
This study assesses the fairness of synthetic healthcare data generated by GANs, revealing that synthetic data exhibits different bias characteristics than real data and that fairness mitigation techniques have varied effects.
Contribution
It provides a systematic evaluation of biases in synthetic healthcare data and compares the impact of fairness mitigation methods on data utility and fairness.
Findings
Synthetic data shows different fairness properties than real data.
Fairness mitigation techniques have varied effects on synthetic data.
Synthetic data is not inherently bias-free.
Abstract
This paper evaluates synthetically generated healthcare data for biases and investigates the effect of fairness mitigation techniques on utility-fairness. Privacy laws limit access to health data such as Electronic Medical Records (EMRs) to preserve patient privacy. Albeit essential, these laws hinder research reproducibility. Synthetic data is a viable solution that can enable access to data similar to real healthcare data without privacy risks. Healthcare datasets may have biases in which certain protected groups might experience worse outcomes than others. With the real data having biases, the fairness of synthetically generated health data comes into question. In this paper, we evaluate the fairness of models generated on two healthcare datasets for gender and race biases. We generate synthetic versions of the dataset using a Generative Adversarial Network called HealthGAN, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHealth Systems, Economic Evaluations, Quality of Life · Global Health Care Issues · Insurance, Mortality, Demography, Risk Management
