Simple and Fast Distillation of Diffusion Models
Zhenyu Zhou, Defang Chen, Can Wang, Chun Chen, Siwei Lyu

TL;DR
This paper introduces Simple and Fast Distillation (SFD), a method that significantly reduces fine-tuning time for diffusion models, enabling efficient high-quality image synthesis with variable NFEs from a single model.
Contribution
SFD simplifies existing distillation methods, reduces fine-tuning time up to 1000 times, and allows variable NFE sampling with a single distilled model.
Findings
Achieves 4.53 FID with NFE=2 on CIFAR-10
Reduces fine-tuning time to 0.64 hours on a single GPU
Balances sample quality and fine-tuning costs effectively
Abstract
Diffusion-based generative models have demonstrated their powerful performance across various tasks, but this comes at a cost of the slow sampling speed. To achieve both efficient and high-quality synthesis, various distillation-based accelerated sampling methods have been developed recently. However, they generally require time-consuming fine tuning with elaborate designs to achieve satisfactory performance in a specific number of function evaluation (NFE), making them difficult to employ in practice. To address this issue, we propose Simple and Fast Distillation (SFD) of diffusion models, which simplifies the paradigm used in existing methods and largely shortens their fine-tuning time up to 1000. We begin with a vanilla distillation-based sampling method and boost its performance to state of the art by identifying and addressing several small yet vital factors affecting the…
Peer Reviews
Decision·NeurIPS 2024 poster
Paper is easy to read and follow. Show effectiveness on stable diffusion and much faster to train for large scale stable Diffusion too. It is interesting to see fixing model at some part improves model over all other parts.
What is effect of SFD on diversity w.r.t distilled model? It might be easy to have high quality but much lower diversity. Proposed unrolling of student model ( global trajectory optimization) is effectively multi-step training like in structured prediction and imitation learning which has shown to be prone to mode-collapse. Justification of method is still unclear and also not sure on how sensitive is SFD to training time noise schedule/time step weighting i.e., forward diffusion process and it
Code & Models
Videos
Taxonomy
TopicsProcess Optimization and Integration · Advanced Control Systems Optimization · Field-Flow Fractionation Techniques
MethodsDiffusion
