Unconditional Latent Diffusion Models Memorize Patient Imaging Data: Implications for Openly Sharing Synthetic Data
Salman Ul Hassan Dar, Marvin Seyfarth, Isabelle Ayx, Theano, Papavassiliu, Stefan O. Schoenberg, Robert Malte Siepmann, Fabian Christopher, Laqua, Jannik Kahmann, Norbert Frey, Bettina Bae{\ss}ler, Sebastian Foersch,, Daniel Truhn, Jakob Nikolas Kather, Sandy Engelhardt

TL;DR
This study investigates the extent of patient data memorization in latent diffusion models used for medical image synthesis, highlighting privacy risks and factors influencing memorization.
Contribution
It introduces a novel self-supervised copy detection method and provides comprehensive analysis of memorization in diffusion models across multiple medical imaging modalities.
Findings
High patient data memorization observed in diffusion models
Augmentation and larger datasets reduce memorization
Over-training increases risk of memorization
Abstract
AI models present a wide range of applications in the field of medicine. However, achieving optimal performance requires access to extensive healthcare data, which is often not readily available. Furthermore, the imperative to preserve patient privacy restricts patient data sharing with third parties and even within institutes. Recently, generative AI models have been gaining traction for facilitating open-data sharing by proposing synthetic data as surrogates of real patient data. Despite the promise, some of these models are susceptible to patient data memorization, where models generate patient data copies instead of novel synthetic samples. Considering the importance of the problem, surprisingly it has received relatively little attention in the medical imaging community. To this end, we assess memorization in unconditional latent diffusion models. We train latent diffusion models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · AI in cancer detection · Radiomics and Machine Learning in Medical Imaging
MethodsDiffusion
