Investigating Data Memorization in 3D Latent Diffusion Models for Medical Image Synthesis
Salman Ul Hassan Dar, Arman Ghanaat, Jannik Kahmann, Isabelle Ayx,, Theano Papavassiliu, Stefan O. Schoenberg, Sandy Engelhardt

TL;DR
This study investigates whether 3D latent diffusion models used for medical image synthesis memorize sensitive training data, highlighting the need for strategies to prevent such memorization to protect patient privacy.
Contribution
The paper provides the first assessment of memorization in 3D latent diffusion models for medical imaging, revealing their tendency to memorize training data and emphasizing privacy concerns.
Findings
Models memorize training samples, risking patient privacy.
Self-supervised contrastive models can detect memorization.
Mitigation strategies are urgently needed.
Abstract
Generative latent diffusion models have been established as state-of-the-art in data generation. One promising application is generation of realistic synthetic medical imaging data for open data sharing without compromising patient privacy. Despite the promise, the capacity of such models to memorize sensitive patient training data and synthesize samples showing high resemblance to training data samples is relatively unexplored. Here, we assess the memorization capacity of 3D latent diffusion models on photon-counting coronary computed tomography angiography and knee magnetic resonance imaging datasets. To detect potential memorization of training samples, we utilize self-supervised models based on contrastive learning. Our results suggest that such latent diffusion models indeed memorize training data, and there is a dire need for devising strategies to mitigate memorization.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · AI in cancer detection · Machine Learning in Healthcare
MethodsDiffusion
