Two Calm Ends and the Wild Middle: A Geometric Picture of Memorization in Diffusion Models
Nick Dodson, Xinyu Gao, Qingsong Wang, Yusu Wang, Zhengchao Wan

TL;DR
This paper introduces a geometric framework to understand when diffusion models memorize training data versus generalize, revealing non-uniform risks across noise levels and proposing interventions to reduce memorization.
Contribution
It provides a novel geometric perspective on diffusion model memorization, identifying a medium noise danger zone and proposing targeted mitigation strategies.
Findings
Memorization risk varies non-uniformly across noise levels.
Medium noise levels pose the highest memorization risk.
Small and large noise regimes resist memorization through different mechanisms.
Abstract
Diffusion models generate high-quality samples but can also memorize training data, raising serious privacy concerns. Understanding the mechanisms governing when memorization versus generalization occurs remains an active area of research. In particular, it is unclear where along the noise schedule memorization is induced, how data geometry influences it, and how phenomena at different noise scales interact. We introduce a geometric framework that partitions the noise schedule into three regimes based on the coverage properties of training data by Gaussian shells and the concentration behavior of the posterior, which we argue are two fundamental objects governing memorization and generalization in diffusion models. This perspective reveals that memorization risk is highly non-uniform across noise levels. We further identify a danger zone at medium noise levels where memorization is most…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Gaussian Processes and Bayesian Inference
