Two Calm Ends and the Wild Middle: A Geometric Picture of Memorization in Diffusion Models

Nick Dodson; Xinyu Gao; Qingsong Wang; Yusu Wang; Zhengchao Wan

arXiv:2602.17846·cs.LG·February 23, 2026

Two Calm Ends and the Wild Middle: A Geometric Picture of Memorization in Diffusion Models

Nick Dodson, Xinyu Gao, Qingsong Wang, Yusu Wang, Zhengchao Wan

PDF

Open Access

TL;DR

This paper introduces a geometric framework to understand when diffusion models memorize training data versus generalize, revealing non-uniform risks across noise levels and proposing interventions to reduce memorization.

Contribution

It provides a novel geometric perspective on diffusion model memorization, identifying a medium noise danger zone and proposing targeted mitigation strategies.

Findings

01

Memorization risk varies non-uniformly across noise levels.

02

Medium noise levels pose the highest memorization risk.

03

Small and large noise regimes resist memorization through different mechanisms.

Abstract

Diffusion models generate high-quality samples but can also memorize training data, raising serious privacy concerns. Understanding the mechanisms governing when memorization versus generalization occurs remains an active area of research. In particular, it is unclear where along the noise schedule memorization is induced, how data geometry influences it, and how phenomena at different noise scales interact. We introduce a geometric framework that partitions the noise schedule into three regimes based on the coverage properties of training data by Gaussian shells and the concentration behavior of the posterior, which we argue are two fundamental objects governing memorization and generalization in diffusion models. This perspective reveals that memorization risk is highly non-uniform across noise levels. We further identify a danger zone at medium noise levels where memorization is most…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Gaussian Processes and Bayesian Inference