Distilling ODE Solvers of Diffusion Models into Smaller Steps

Sanghwan Kim; Hao Tang; and Fisher Yu

arXiv:2309.16421·cs.CV·March 28, 2024

Distilling ODE Solvers of Diffusion Models into Smaller Steps

Sanghwan Kim, Hao Tang, and Fisher Yu

PDF

Open Access 3 Reviews

TL;DR

This paper introduces D-ODE solvers, a distillation-based method that improves the efficiency and quality of diffusion model sampling by combining strengths of existing ODE solvers with minimal overhead.

Contribution

The paper proposes a simple, effective distillation approach for ODE solvers in diffusion models, enabling faster sampling with fewer function evaluations and better trajectory tracking.

Findings

01

D-ODE solvers outperform existing ODE solvers like DDIM and PNDM in low NFE scenarios.

02

The method achieves high-quality image generation with negligible additional computational cost.

03

Qualitative analysis confirms improved trajectory fidelity and image quality.

Abstract

Abstract Diffusion models have recently gained prominence as a novel category of generative models. Despite their success, these models face a notable drawback in terms of slow sampling speeds, requiring a high number of function evaluations (NFE) in the order of hundreds or thousands. In response, both learning-free and learning-based sampling strategies have been explored to expedite the sampling process. Learning-free sampling employs various ordinary differential equation (ODE) solvers based on the formulation of diffusion ODEs. However, it encounters challenges in faithfully tracking the true sampling trajectory, particularly for small NFE. Conversely, learning-based sampling methods, such as knowledge distillation, demand extensive additional training, limiting their practical applicability. To overcome these limitations, we introduce Distilled-ODE solvers (D-ODE solvers), a…

Peer Reviews

Decision·ICLR 2024 Conference Withdrawn Submission

Reviewer 01Rating 3· reject, not good enoughConfidence 4

Strengths

- The proposed method is parallel with other well-deigned fast samplers and can be used for further improving them. - The experiments show the effectiveness of the proposed method.

Weaknesses

- Major: - The writing needs to be greatly improved, and it is rather hard to follow the paper by only reading the main text. For example, the **$D_t$ and $D_\theta$ are not defined in the main text**, so that I cannot understand the main method until I carefully read the appedix. Please add a rigorous defination of the "denoising prediction". - Lack of a detailed discussion with multi-step ODE solvers, such as DEIS[1] and DPM-Solver++[2]. In fact, the proposed linear combination of current

Reviewer 02Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

$\cdot$ The writing of this paper is clear and self-contained, which helps the reviewer to quickly follow. $\cdot$ Rather than distilling the denoising network, this paper proposes a new method to distill the ODE solver. $\cdot$ The experiment results show improvement over the baseline methods.

Weaknesses

$\cdot$ The improvement over the baseline method is limited as results shown in Figure 3 and Figure 4, especially for the DEIS3 ODE solver. The improvement on DDIM and EDM Heun are better, but these two ODE solvers do not perform well when NFEs are small. Although the proposed method requires less computational time compared with methods like Consistency Distillation or Progressive Distillation, the weak performance hinders its application in practice. $\cdot$ Seems to miss quantitative evaluat

Reviewer 03Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

Different from progressive distillation which needs to fine-tune the student model, this paper considers optimizing the scalar lambda_i in front of some gradient vector per diffusion timestep. The gradient vector is a function of estimated clean images. The optimization process is very cheap since it only involves quadratic optimization. The resulting FID improvement is reasonable.

Weaknesses

My main concern is the novelty of this paper. An arXiv paper released in April this year has a similar research idea as this paper. The paper title is "On Accelerating Diffusion-Based Sampling Process via Improved Integration Approximation". The arXiv paper considered improving the sampling performance of EDM, DDIM, DPM-Solver, and SPNDM for small NFEs. One main difference between the two papers is that the arXiv paper considered optimizing a number of coefficients in front of some gradient vec

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Model Reduction and Neural Networks · Lattice Boltzmann Simulation Studies

MethodsKnowledge Distillation · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Diffusion