The Devil is in the Prompts: De-Identification Traces Enhance Memorization Risks in Synthetic Chest X-Ray Generation
Raman Dutt

TL;DR
This paper investigates how prompts containing de-identification traces in synthetic chest X-ray generation lead to increased memorization risks, revealing limitations of current privacy mitigation strategies and proposing new approaches.
Contribution
It systematically identifies prompts and tokens contributing to memorization in medical image synthesis, highlighting issues with anonymization practices and evaluating mitigation strategies.
Findings
De-identification markers are the most memorized tokens.
Prompts with de-identification traces significantly increase memorization risk.
Existing mitigation strategies are ineffective against prompt-based memorization.
Abstract
Generative models, particularly text-to-image (T2I) diffusion models, play a crucial role in medical image analysis. However, these models are prone to training data memorization, posing significant risks to patient privacy. Synthetic chest X-ray generation is one of the most common applications in medical image analysis with the MIMIC-CXR dataset serving as the primary data repository for this task. This study presents the first systematic attempt to identify prompts and text tokens in MIMIC-CXR that contribute the most to training data memorization. Our analysis reveals two unexpected findings: (1) prompts containing traces of de-identification procedures (markers introduced to hide Protected Health Information) are the most memorized, and (2) among all tokens, de-identification markers contribute the most towards memorization. This highlights a broader issue with the standard…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Imaging Techniques and Applications · Advanced Radiotherapy Techniques · Digital Radiography and Breast Imaging
MethodsDiffusion
