A Geometric Perspective on Diffusion Models
Defang Chen, Zhenyu Zhou, Jian-Ping Mei, Chunhua Shen, Chun Chen, Can, Wang

TL;DR
This paper offers a geometric analysis of diffusion models, revealing new insights into their sampling dynamics, trajectories, and connections to classical algorithms, thereby advancing understanding and potential improvements in generative modeling.
Contribution
It introduces a geometric perspective on diffusion model sampling, uncovering the structure of trajectories and establishing a theoretical link to mean-shift algorithms.
Findings
Discovered smooth connections between data and noise distributions via quasi-linear trajectories
Identified implicit denoising trajectories that converge faster
Linked optimal ODE sampling to mean-shift algorithms
Abstract
Recent years have witnessed significant progress in developing effective training and fast sampling techniques for diffusion models. A remarkable advancement is the use of stochastic differential equations (SDEs) and their marginal-preserving ordinary differential equations (ODEs) to describe data perturbation and generative modeling in a unified framework. In this paper, we carefully inspect the ODE-based sampling of a popular variance-exploding SDE and reveal several intriguing structures of its sampling dynamics. We discover that the data distribution and the noise distribution are smoothly connected with a quasi-linear sampling trajectory and another implicit denoising trajectory that even converges faster. Meanwhile, the denoising trajectory governs the curvature of the corresponding sampling trajectory and its finite differences yield various second-order samplers used in…
Peer Reviews
Decision·ICLR 2024 Conference Withdrawn Submission
- Very clear exposition and intuitive geometric explanation for some interesting observations on ODE-based sampling of VE diffusion models - Simple yet effective theory linking mean-shift algorithms to diffusion models which I have not seen before. - Theory is supported by empirical evidence in large-scale diffusion models.
- My main issue with this work is that it is unclear exactly how to use these theoretical insights and observations to improve diffusion models and diffusion model sampling. While it is interesting to link mean-shift algorithms to diffusion models it is not shown how this link leads to improved sampling. Similarly, while all the fast ODE solvers are shown to be related, it is unclear how to use this for better sampling. - Theory is all quite simple (though useful), and is not particularly novel.
1. This paper combines lots of analytical and experimental evidences to support the geometric understanding of the sampling trajectory and the denoising trajectory in EDM.
1. Lack of novelty. Most of the results/techniques have been observed/used in other papers.
1) The paper deals with an interesting problem, that of generative modeling using exploding-variance diffusion models. 2) To the best of my knowledge, Theorem 1 is new. 3) Experimental results are of some interest.
1) Mathematical writing is sloppy. 2) The observations are not sufficiently well justified, and their impact is limited. 3) Some of the mathematical results are trivial. More specific remarks: Section 2 : - There is an ambiguity between distribution and density function. Given the formula of the score function used, for instance, in (2), $p_t$ denotes the probability density function with respect to the Lebesgue measure. Then, it is written that $p_0 = p_d$ is the empirical data distribut
* The paper's geometrical analysis of the forward and backward dynamics associated with Variance Exploding-SDE diffusion models is interesting and adds a new dimension to the understanding of these models. This type of analysis can be crucial for developing a deeper understanding of the underlying mechanisms of SDE-based models and their behavior during the sampling and denoising processes. Such insights can potentially inform the design of more efficient or accurate diffusion models in the futu
* One major concern is that the authors focus only on the VE case (although they mention that a similar analysis applies to a variance preserving SDE). While interesting, such limitation reduces generality of the claims. Some of the numerical observations might simply be artifacts of the considered SDE class, and do not provide information about diffusion models in general. Moreover, I find the different sections to be somehow disconnected; in certain cases some results are presented as original
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Mathematical Biology Tumor Growth · Gaussian Processes and Bayesian Inference
MethodsDiffusion
