TL;DR
This paper introduces a Gaussian Mixture Model (GMM) as a reverse transition kernel in DDIM, improving sample quality in diffusion models, especially with fewer steps, demonstrated across multiple datasets.
Contribution
It proposes a moment-matching GMM kernel for DDIM, enhancing sampling efficiency and quality over traditional Gaussian kernels, with extensive experimental validation.
Findings
GMM kernel yields better sample quality than Gaussian kernel at small step counts.
Using 10 steps on ImageNet, GMM achieves FID of 6.94 and IS of 207.85.
The approach improves both unconditional and class-conditional diffusion models.
Abstract
We propose using a Gaussian Mixture Model (GMM) as reverse transition operator (kernel) within the Denoising Diffusion Implicit Models (DDIM) framework, which is one of the most widely used approaches for accelerated sampling from pre-trained Denoising Diffusion Probabilistic Models (DDPM). Specifically we match the first and second order central moments of the DDPM forward marginals by constraining the parameters of the GMM. We see that moment matching is sufficient to obtain samples with equal or better quality than the original DDIM with Gaussian kernels. We provide experimental results with unconditional models trained on CelebAHQ and FFHQ, class-conditional models trained on ImageNet, and text-to-image generation using Stable Diffusion v2.1 on COYO700M datasets respectively. Our results suggest that using the GMM kernel leads to significant improvements in the quality of the…
Peer Reviews
Decision·ICLR 2024 Conference Withdrawn Submission
-The paper proposes a variant of DDIM, utilizing a Gaussian Mixture Model as the reverse transition kernel. - The paper suggests that moment matching is sufficient for producing samples of equal or superior quality compared to the original DDIM. - The presentation is clear and easy to understand.
- DDIM is typically employed with $\eta=0$. Although the proposed method appears to significantly enhance the performance of the original DDIM when $\eta\neq 0$, numerical results suggest that its performance is generally inferior compared to DDIM with $\eta=0$. For instance, in Fig. 1, the FID scores range from 25 to 35 when $\eta=0$ and the number of steps is 10, whereas they range from 60 to 70 when $\eta=1.0$ and the number of steps is 10. Furthermore, as the number of steps increases beyond
Taking advantage of GMM in DDIM sampling appears to be a sensible approach, and is presumably more capable than a unimodal Gaussian. The main hurdle of using GMM would be the increased complexity and the lack of parameters learning schemes. The highlighted contribution of this paper is to provide a feasible moment matching approach for choosing GMM parameters. As far as I checked, this technique is technically sound.
The main problem of this paper would be clarity. - There are numerous ill-defined variables and formulas in the main paper, which, to a large extent, hinder readers' understanding. This current presentation is kind of poor that I struggle to read all the derivation in the main paper and the appendix. I suggest to **bold** all vectors and matrices, following the usual practice of ICLR papers, to differentiate them from scalars. For example, in Eq. (9), it is very hard tell how $O_t$ could possib
* The proposed method can represent transitions with more parameters at each transition than DDIM with a Gaussian kernel. Without additional training, the method can affordably enhance the expressiveness over the Gaussian kernel by adding parameters.
* **Limited novelty.** It is an incremental approach to the DDIM sampling method w/ a Gaussian kernel. It only shows a comparison with DDIM w/ a Gaussian kernel, without performance comparisons with other methods. * **Marginal improvement.** Looking at the FID and IS results in Figures 1, 2, and 3, and the tables in the Appendix, the performance improvement over DDIM is marginal. The proposed method only shows a slight performance improvement at fewer sampling steps (around 10 steps) where DDIM
+ The authors provide mathematical proof that the GMM-based sampling algorithm can be used for models obtained by DDPM training. + The proposed method performs better in both conditional and unconditional generation than the DDIM method.
+ The motivation for this paper is confusing; why use a Gaussian Mixture Model (GMM) and what are the benefits of such an assumption? + The experiment is weak and only compared the DDIM method as a baseline, but other methods for improved sampling (e.g. DPM-Solver[1]) are not compared. Meanwhile, since DDIM is an ODE-based method, the relationship between DDIM-GMM and ODE should also be discussed [1] Lu C, Zhou Y, Bao F, et al. Dpm-solver: A fast ode solver for diffusion probabilistic model samp
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning
MethodsDiffusion
