Distilling ODE Solvers of Diffusion Models into Smaller Steps
Sanghwan Kim, Hao Tang, and Fisher Yu

TL;DR
This paper introduces D-ODE solvers, a distillation-based method that improves the efficiency and quality of diffusion model sampling by combining strengths of existing ODE solvers with minimal overhead.
Contribution
The paper proposes a simple, effective distillation approach for ODE solvers in diffusion models, enabling faster sampling with fewer function evaluations and better trajectory tracking.
Findings
D-ODE solvers outperform existing ODE solvers like DDIM and PNDM in low NFE scenarios.
The method achieves high-quality image generation with negligible additional computational cost.
Qualitative analysis confirms improved trajectory fidelity and image quality.
Abstract
Abstract Diffusion models have recently gained prominence as a novel category of generative models. Despite their success, these models face a notable drawback in terms of slow sampling speeds, requiring a high number of function evaluations (NFE) in the order of hundreds or thousands. In response, both learning-free and learning-based sampling strategies have been explored to expedite the sampling process. Learning-free sampling employs various ordinary differential equation (ODE) solvers based on the formulation of diffusion ODEs. However, it encounters challenges in faithfully tracking the true sampling trajectory, particularly for small NFE. Conversely, learning-based sampling methods, such as knowledge distillation, demand extensive additional training, limiting their practical applicability. To overcome these limitations, we introduce Distilled-ODE solvers (D-ODE solvers), a…
Peer Reviews
Decision·ICLR 2024 Conference Withdrawn Submission
- The proposed method is parallel with other well-deigned fast samplers and can be used for further improving them. - The experiments show the effectiveness of the proposed method.
- Major: - The writing needs to be greatly improved, and it is rather hard to follow the paper by only reading the main text. For example, the **$D_t$ and $D_\theta$ are not defined in the main text**, so that I cannot understand the main method until I carefully read the appedix. Please add a rigorous defination of the "denoising prediction". - Lack of a detailed discussion with multi-step ODE solvers, such as DEIS[1] and DPM-Solver++[2]. In fact, the proposed linear combination of current
$\cdot$ The writing of this paper is clear and self-contained, which helps the reviewer to quickly follow. $\cdot$ Rather than distilling the denoising network, this paper proposes a new method to distill the ODE solver. $\cdot$ The experiment results show improvement over the baseline methods.
$\cdot$ The improvement over the baseline method is limited as results shown in Figure 3 and Figure 4, especially for the DEIS3 ODE solver. The improvement on DDIM and EDM Heun are better, but these two ODE solvers do not perform well when NFEs are small. Although the proposed method requires less computational time compared with methods like Consistency Distillation or Progressive Distillation, the weak performance hinders its application in practice. $\cdot$ Seems to miss quantitative evaluat
Different from progressive distillation which needs to fine-tune the student model, this paper considers optimizing the scalar lambda_i in front of some gradient vector per diffusion timestep. The gradient vector is a function of estimated clean images. The optimization process is very cheap since it only involves quadratic optimization. The resulting FID improvement is reasonable.
My main concern is the novelty of this paper. An arXiv paper released in April this year has a similar research idea as this paper. The paper title is "On Accelerating Diffusion-Based Sampling Process via Improved Integration Approximation". The arXiv paper considered improving the sampling performance of EDM, DDIM, DPM-Solver, and SPNDM for small NFEs. One main difference between the two papers is that the arXiv paper considered optimizing a number of coefficients in front of some gradient vec
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Model Reduction and Neural Networks · Lattice Boltzmann Simulation Studies
MethodsKnowledge Distillation · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Diffusion
