Finding Dori: Memorization in Text-to-Image Diffusion Models Is Not Local
Antoni Kowalczuk, Dominik Hintersdorf, Lukas Struppek, Kristian Kersting, Adam Dziedzic, Franziska Boenisch

TL;DR
This paper demonstrates that memorization in text-to-image diffusion models is distributed and fragile, challenging the assumption of locality and proposing adversarial fine-tuning for more robust mitigation.
Contribution
It reveals that memorization is non-local, showing that existing pruning methods are insufficient, and introduces adversarial fine-tuning as a more effective mitigation strategy.
Findings
Memorization triggers are distributed throughout text embedding space.
Small perturbations can re-trigger memorization despite pruning.
Adversarial fine-tuning improves mitigation robustness.
Abstract
Text-to-image diffusion models (DMs) have achieved remarkable success in image generation. However, concerns about data privacy and intellectual property remain due to their potential to inadvertently memorize and replicate training data. Recent mitigation efforts have focused on identifying and pruning weights responsible for triggering verbatim training data replication, based on the assumption that memorization can be localized. We challenge this assumption and demonstrate that, even after such pruning, small perturbations to the text embeddings of previously mitigated prompts can re-trigger data replication, revealing the fragility of such defenses. Our further analysis then provides multiple indications that memorization is indeed not inherently local: (1) replication triggers for memorized images are distributed throughout text embedding space; (2) embeddings yielding the same…
Peer Reviews
Decision·Submitted to ICLR 2026
1. The paper addresses an important topic, memorization and privacy in diffusion models, which is highly relevant to model safety and responsible AI. 2. The paper is well-written and highlights an underexplored dimension of diffusion model memorization behavior.
1. **Unrealistic scenario.** The proposed setting is impractical in the real world. The proposed finetuning-based method seems to align with the model publisher’s side (those who wish to make their models trustworthy). However, the paper assumes a white-box adversary with full access to the source code, as explicitly mentioned by the authors. However, most real-world text-to-image (T2I) systems, such as Midjourney or ChatGPT, only expose text-level APIs, not model internals. Therefore, the claim
1. The paper is well-written and easy to follow. 2. The paper provides a novel and practical perspective on memorization in text-to-image diffusion models by explicitly challenging the locality assumption underlying prior pruning-based mitigation strategies. 3. The methodology is rigorous and comprehensive. 4. The experimental setup is sound, and the metrics used are clearly defined and justified. The authors thoroughly compare Dori against NeMo, Wanda, SISS, and concept-unlearning baselines
1. **Model Scope.** The analysis focuses exclusively on Stable Diffusion v1.4, the only model with known memorized prompts. While the authors justify this choice, it limits claims of generality. Extending the evaluation to even a partially curated SD v1.5 or fine-tuned variants would strengthen the argument. Could the authors comment on whether non-local memorization might emerge differently in larger or more recent models (e.g., SDXL or FLUX)? 2. **Computational Overhead.** The proposed adversa
Strengths: 1. Meaningful observation. This paper provides an insightful questioning about the memorization locality assumption of pruning based mitigation methods, showing that the memory is not fully erased even after pruning. The observations related to memory have the potential to give awareness to later researchers studying memorization of diffusion models. This is the major reason I think it is worth accepting. 2. Clarity of the structure. The story line of the paper is clear, from the meth
Weaknesses: 1. Experiment fairness. In section 4.4 ‘our mitigation’ is optimized upon Dori’s adversarial embeddings, afterwards, it is compared to baselines again with Dori as in Table 2. I wonder if this experiment setting may cause an unfair situation to baselines. Since only ‘our mitigation’ is optimized on adversarial methods, the baselines have no chance to gain such advantage. 2. Supplementary experiments. if ‘our mitigation’ approach has another version without participation of adversaria
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Humanities and Scholarship · Computational and Text Analysis Methods
