Bigger Isn't Always Memorizing: Early Stopping Overparameterized Diffusion Models
Alessandro Favero, Antonio Sclocchi, Matthieu Wyart

TL;DR
This paper demonstrates that overparameterized diffusion models can generalize well before memorizing training data, with early stopping based on dataset size effectively balancing generalization and memorization.
Contribution
It reveals that generalization occurs prior to memorization in overparameterized diffusion models and introduces a phase diagram to understand this phenomenon.
Findings
Generalization precedes memorization in overparameterized diffusion models.
Memorization time scales proportionally with dataset size.
Early stopping guided by dataset size improves generalization and privacy.
Abstract
Diffusion probabilistic models have become a cornerstone of modern generative AI, yet the mechanisms underlying their generalization remain poorly understood. In fact, if these models were perfectly minimizing their training loss, they would just generate data belonging to their training set, i.e., memorize, as empirically found in the overparameterized regime. We revisit this view by showing that, in highly overparameterized diffusion models, generalization in natural data domains is progressively achieved during training before the onset of memorization. Our results, ranging from image to language diffusion models, systematically support the empirical law that memorization time is proportional to the dataset size. Generalization vs. memorization is then best understood as a competition between time scales. We show that this phenomenology is recovered in diffusion models learning a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage and cultural evolution · Generative Adversarial Networks and Image Synthesis · Topic Modeling
MethodsDiffusion · Early Stopping
