Generative clinical time series models trained on moderate amounts of patient data are privacy preserving
Rustam Zhumagambetov, Niklas Giesa, Sebastian D. Boie, Stefan Haufe

TL;DR
This study evaluates the privacy preservation of generative models for clinical time series data, demonstrating that with sufficient training data, these models resist privacy attacks without additional differential privacy mechanisms, which often reduce data utility.
Contribution
The paper provides an empirical privacy audit of state-of-the-art generative models on clinical data, showing their inherent privacy robustness with large datasets and discussing limitations of differential privacy methods.
Findings
Privacy attacks are ineffective on models trained with large datasets.
Differential privacy mechanisms may reduce data utility without improving privacy.
Synthetic data maintains privacy when trained on sufficient data size.
Abstract
Sharing medical data for machine learning model training purposes is often impossible due to the risk of disclosing identifying information about individual patients. Synthetic data produced by generative artificial intelligence (genAI) models trained on real data is often seen as one possible solution to comply with privacy regulations. While powerful genAI models for heterogeneous hospital time series have recently been introduced, such modeling does not guarantee privacy protection, as the generated data may still reveal identifying information about individuals in the models' training cohort. Applying established privacy mechanisms to generative time series models, however, proves challenging as post-hoc data anonymization through k-anonymization or similar techniques is limited, while model-centered privacy mechanisms that implement differential privacy (DP) may lead to unstable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Machine Learning in Healthcare · Artificial Intelligence in Healthcare and Education
