DISCO-TAB: A Hierarchical Reinforcement Learning Framework for Privacy-Preserving Synthesis of Complex Clinical Data
Arshia Ilaty, Hossein Shirazi, Amir Rahmani, Hajar Homayouni

TL;DR
DISCO-TAB is a hierarchical reinforcement learning framework that enhances privacy-preserving synthetic clinical data generation by capturing complex dependencies and maintaining data utility and privacy.
Contribution
It introduces a multi-granularity discriminator system with reinforcement learning to improve the realism and utility of synthetic EHR data, surpassing prior methods.
Findings
Achieves up to 38.2% improvement in clinical classifier utility.
Ensures statistical fidelity with JSD < 0.01.
Demonstrates robustness against membership inference attacks.
Abstract
The development of robust clinical decision support systems is frequently impeded by the scarcity of high-fidelity, privacy-preserving biomedical data. While Generative Large Language Models (LLMs) offer a promising avenue for synthetic data generation, they often struggle to capture the complex, non-linear dependencies and severe class imbalances inherent in Electronic Health Records (EHR), leading to statistically plausible but clinically invalid records. To bridge this gap, we introduce DISCO-TAB (DIScriminator-guided COntrol for TABular synthesis), a novel framework that orchestrates a fine-tuned LLM with a multi-objective discriminator system optimized via Reinforcement Learning. Unlike prior methods relying on scalar feedback, DISCO-TAB evaluates synthesis at four granularities, token, sentence, feature, and row, while integrating Automated Constraint Discovery and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
