Fairness-Optimized Synthetic EHR Generation for Arbitrary Downstream Predictive Tasks

Mirza Farhan Bin Tarek; Raphael Poulain; Rahmatollah Beheshti

arXiv:2406.02510·cs.LG·June 30, 2025·2 cites

Fairness-Optimized Synthetic EHR Generation for Arbitrary Downstream Predictive Tasks

Mirza Farhan Bin Tarek, Raphael Poulain, Rahmatollah Beheshti

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel pipeline for generating synthetic EHR data that maintains fidelity to real data while reducing fairness concerns in various downstream clinical prediction tasks, enhancing fairness in health AI.

Contribution

The study presents a task- and model-agnostic synthetic EHR generation pipeline that improves fairness in downstream health AI applications, filling a gap in existing methods.

Findings

01

Effective in reducing fairness concerns across multiple tasks

02

Applicable to different EHR datasets

03

Complementary to existing fairness methods

Abstract

Among various aspects of ensuring the responsible design of AI tools for healthcare applications, addressing fairness concerns has been a key focus area. Specifically, given the wide spread of electronic health record (EHR) data and their huge potential to inform a wide range of clinical decision support tasks, improving fairness in this category of health AI tools is of key importance. While such a broad problem (mitigating fairness in EHR-based AI models) has been tackled using various methods, task- and model-agnostic methods are noticeably rare. In this study, we aimed to target this gap by presenting a new pipeline that generates synthetic EHR data, which is not only consistent with (faithful to) the real EHR data but also can reduce the fairness concerns (defined by the end-user) in the downstream tasks, when combined with the real data. We demonstrate the effectiveness of our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

healthylaife/fairsynth
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI)

MethodsFocus