Unconditional Latent Diffusion Models Memorize Patient Imaging Data:   Implications for Openly Sharing Synthetic Data

Salman Ul Hassan Dar; Marvin Seyfarth; Isabelle Ayx; Theano; Papavassiliu; Stefan O. Schoenberg; Robert Malte Siepmann; Fabian Christopher; Laqua; Jannik Kahmann; Norbert Frey; Bettina Bae{\ss}ler; Sebastian Foersch,; Daniel Truhn; Jakob Nikolas Kather; Sandy Engelhardt

arXiv:2402.01054·eess.IV·January 9, 2025·2 cites

Unconditional Latent Diffusion Models Memorize Patient Imaging Data: Implications for Openly Sharing Synthetic Data

Salman Ul Hassan Dar, Marvin Seyfarth, Isabelle Ayx, Theano, Papavassiliu, Stefan O. Schoenberg, Robert Malte Siepmann, Fabian Christopher, Laqua, Jannik Kahmann, Norbert Frey, Bettina Bae{\ss}ler, Sebastian Foersch,, Daniel Truhn, Jakob Nikolas Kather, Sandy Engelhardt

PDF

Open Access 1 Repo

TL;DR

This study investigates the extent of patient data memorization in latent diffusion models used for medical image synthesis, highlighting privacy risks and factors influencing memorization.

Contribution

It introduces a novel self-supervised copy detection method and provides comprehensive analysis of memorization in diffusion models across multiple medical imaging modalities.

Findings

01

High patient data memorization observed in diffusion models

02

Augmentation and larger datasets reduce memorization

03

Over-training increases risk of memorization

Abstract

AI models present a wide range of applications in the field of medicine. However, achieving optimal performance requires access to extensive healthcare data, which is often not readily available. Furthermore, the imperative to preserve patient privacy restricts patient data sharing with third parties and even within institutes. Recently, generative AI models have been gaining traction for facilitating open-data sharing by proposing synthetic data as surrogates of real patient data. Despite the promise, some of these models are susceptible to patient data memorization, where models generate patient data copies instead of novel synthetic samples. Considering the importance of the problem, surprisingly it has received relatively little attention in the medical imaging community. To this end, we assess memorization in unconditional latent diffusion models. We train latent diffusion models…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Cardio-AI/memorization-ldm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · AI in cancer detection · Radiomics and Machine Learning in Medical Imaging

MethodsDiffusion