Provably Learning Diffusion Models under the Manifold Hypothesis: Collapse and Refine

Wei Huang; Andi Han; Mingyuan Bai; Huanjian Zhou; Qixin Zhang; Taiji Suzuki; Kenji Fukumizu

arXiv:2605.20235·cs.LG·May 21, 2026

Provably Learning Diffusion Models under the Manifold Hypothesis: Collapse and Refine

Wei Huang, Andi Han, Mingyuan Bai, Huanjian Zhou, Qixin Zhang, Taiji Suzuki, Kenji Fukumizu

PDF

TL;DR

This paper provides a theoretical explanation for how diffusion models efficiently learn on low-dimensional manifolds, introducing a new framework called Score-induced Latent Diffusion (SiLD) that improves generation and reconstruction.

Contribution

It identifies a collapse-and-refine mechanism in diffusion models and proposes SiLD, a two-stage score-based framework that leverages this mechanism for better manifold learning.

Findings

01

SiLD matches or outperforms VAE-based models in generation quality.

02

SiLD improves reconstruction accuracy across benchmarks.

03

Sample complexity depends on intrinsic dimension, not ambient dimension.

Abstract

Diffusion models generate high-dimensional data with remarkable quality, yet how their training efficiently learns the score function, bypassing the curse of dimensionality when data is supported on low-dimensional manifolds, remains theoretically unexplained. We identify a collapse-and-refine mechanism driven by the geometry of the score function itself: at small noise scales, the diverging singularity of the score drives a rapid dimensional collapse of the induced denoising map onto the data manifold projection; at moderate noise scales, training refines the intrinsic density on the learned manifold. We instantiate this principle as Score-induced Latent Diffusion (SiLD), a two-stage framework in which both manifold learning and density estimation emerge from a single denoising score matching objective, replacing the heuristic KL regularization of VAE-based latent diffusion models. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.