Learning to Discretize Denoising Diffusion ODEs
Vinh Tong, Hoang Trung-Dung, Anji Liu, Guy Van den Broeck, Mathias Niepert

TL;DR
This paper introduces LD3, a lightweight framework that learns optimal time discretization to improve sampling efficiency in diffusion probabilistic models, reducing computational costs while maintaining high-quality generation.
Contribution
LD3 is a novel method that learns the optimal discretization for sampling in diffusion models without retraining the entire network.
Findings
LD3 improves sampling efficiency with less computational overhead.
Achieves state-of-the-art FID scores with fewer neural function evaluations.
Demonstrates effectiveness across multiple pre-trained models and sampling settings.
Abstract
Diffusion Probabilistic Models (DPMs) are generative models showing competitive performance in various domains, including image synthesis and 3D point cloud generation. Sampling from pre-trained DPMs involves multiple neural function evaluations (NFEs) to transform Gaussian noise samples into images, resulting in higher computational costs compared to single-step generative models such as GANs or VAEs. Therefore, reducing the number of NFEs while preserving generation quality is crucial. To address this, we propose LD3, a lightweight framework designed to learn the optimal time discretization for sampling. LD3 can be combined with various samplers and consistently improves generation quality without having to retrain resource-intensive neural networks. We demonstrate analytically and empirically that LD3 improves sampling efficiency with much less computational overhead. We evaluate our…
Peer Reviews
Decision·ICLR 2025 Oral
- The paper is well-written and easy to follow. - It presents an easy solution to the sampling problem of diffusion models that only requires limited training time while obtaining. - The soft teacher loss is effective and simple to implement. - The evaluation is thorough and includes multiple models, multiple datasets, and multiple sampling strategies. In general, I liked the paper and I lean toward acceptance. However, since this is not my area of expertise, I would wait for the discussion
Although I liked the paper, there are some concerns that, if addressed, would improve the paper. In the following paragraphs, I describe my concerns in detail: - In the table with the main results, sometimes it is not clear what the metrics are computed against. I suppose the metrics in table 2, 3, 4, and 5 are computed against random samples of the model using the accurate estimation of the ODE. However, if this metric is computed against the true distribution, the performance of the teacher w
- The LD3 algorithm is extremely lightweight, requiring only 100 samples and less than 1 hour on a single GPU to learn optimized sampling schedules. - The method is evaluated on a comprehensive set of pretrained models and compared against several baseline, showing improved quality in the majority of cases - A proper ablation study is done on the various choices/hyperparameters.
- There are several typos in the paper. See some examples below: - Algorithm 1 Line 6: $x'_T ← x'_T + ...$ must be $x'_T ← x_T + ...$ - Line 251: $x'T \rightarrow x'_T$ - Line 251: $\Psi*(x_T) \rightarrow \Psi_*(x_T)$ - Theorem 1 requires more explanation on its invertibility assumption. Specifically, if the NFE is small, functions $\Psi_*, \Psi_\xi$ invertibility is a non-trivial fact which requires some justification on its assumption. - The method relies on a learned perceptual
- The paper includes proofs of soundness of their proposed minimization objectives, going beyond purely empirical contribution. - The number of experiments is substantial across both datasets, baselines from prior work, and choice of pretrained models. - The experiments conducted include notoriously difficult datasets in the literature of diffusion step reduction like ImageNet, and shows improvement in more complex settings such as a text to image model. - The objective is cheap to train compare
There are two main areas around which the paper could be much stronger. The first is in comparisons to distillation methods, which are among the strongest in the literature. The paper includes a comparison to progressive distillation and consistency distillation in Table 9, but it is really difficult to compare these methods apples-to-apples. There are details missing (please correct me if I missed these e.g. in the supplementary material) such as what models were compared and where are the base
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Numerical Analysis Techniques
MethodsDiffusion
