Memorization and Generalization in Generative Diffusion under the Manifold Hypothesis
Beatrice Achilli, Luca Ambrogioni, Carlo Lucibello, Marc M\'ezard, Enrico Ventura

TL;DR
This paper analyzes how diffusion models memorize and generalize structured data on manifolds, revealing that data structure enhances generalization and that memorization and generalization phases are distinct with optimal generalization occurring during memorization.
Contribution
It provides a theoretical framework connecting diffusion models' behavior to the Random Energy Model, characterizing memorization and generalization phases in high-dimensional structured data.
Findings
Memorization time $t_c$ decreases with data structure complexity.
Generalization is optimized at a time $t_g$ before memorization.
Structured data avoids the curse of dimensionality in memorization.
Abstract
We study the memorization and generalization capabilities of Diffusion Models (DMs) when data lies on a structured latent manifold. Specifically, we consider a set of data points in dimensions confined to a latent subspace of dimension , following the Hidden Manifold Model (HMM). We analyze the reverse diffusion process using the empirical score function as a proxy, and characterize it in the high-dimensional limit , , by exploiting a connection with the Random Energy Model (REM). We show that a characteristic time marks the emergence of traps in the time-dependent potential, which however do not affect typical trajectories. The size of their basins of attraction is computed at all times. We derive the collapse time , at which trajectories fall into the basin of a training point, signaling memorization. An explicit…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndustrial Technology and Control Systems
