Provably Learning Diffusion Models under the Manifold Hypothesis: Collapse and Refine
Wei Huang, Andi Han, Mingyuan Bai, Huanjian Zhou, Qixin Zhang, Taiji Suzuki, Kenji Fukumizu

TL;DR
This paper provides a theoretical explanation for how diffusion models efficiently learn on low-dimensional manifolds, introducing a new framework called Score-induced Latent Diffusion (SiLD) that improves generation and reconstruction.
Contribution
It identifies a collapse-and-refine mechanism in diffusion models and proposes SiLD, a two-stage score-based framework that leverages this mechanism for better manifold learning.
Findings
SiLD matches or outperforms VAE-based models in generation quality.
SiLD improves reconstruction accuracy across benchmarks.
Sample complexity depends on intrinsic dimension, not ambient dimension.
Abstract
Diffusion models generate high-dimensional data with remarkable quality, yet how their training efficiently learns the score function, bypassing the curse of dimensionality when data is supported on low-dimensional manifolds, remains theoretically unexplained. We identify a collapse-and-refine mechanism driven by the geometry of the score function itself: at small noise scales, the diverging singularity of the score drives a rapid dimensional collapse of the induced denoising map onto the data manifold projection; at moderate noise scales, training refines the intrinsic density on the learned manifold. We instantiate this principle as Score-induced Latent Diffusion (SiLD), a two-stage framework in which both manifold learning and density estimation emerge from a single denoising score matching objective, replacing the heuristic KL regularization of VAE-based latent diffusion models. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
