Simple and Fast Distillation of Diffusion Models

Zhenyu Zhou; Defang Chen; Can Wang; Chun Chen; Siwei Lyu

arXiv:2409.19681·cs.CV·October 1, 2024

Simple and Fast Distillation of Diffusion Models

Zhenyu Zhou, Defang Chen, Can Wang, Chun Chen, Siwei Lyu

PDF

Open Access 2 Repos 1 Video 1 Reviews

TL;DR

This paper introduces Simple and Fast Distillation (SFD), a method that significantly reduces fine-tuning time for diffusion models, enabling efficient high-quality image synthesis with variable NFEs from a single model.

Contribution

SFD simplifies existing distillation methods, reduces fine-tuning time up to 1000 times, and allows variable NFE sampling with a single distilled model.

Findings

01

Achieves 4.53 FID with NFE=2 on CIFAR-10

02

Reduces fine-tuning time to 0.64 hours on a single GPU

03

Balances sample quality and fine-tuning costs effectively

Abstract

Diffusion-based generative models have demonstrated their powerful performance across various tasks, but this comes at a cost of the slow sampling speed. To achieve both efficient and high-quality synthesis, various distillation-based accelerated sampling methods have been developed recently. However, they generally require time-consuming fine tuning with elaborate designs to achieve satisfactory performance in a specific number of function evaluation (NFE), making them difficult to employ in practice. To address this issue, we propose Simple and Fast Distillation (SFD) of diffusion models, which simplifies the paradigm used in existing methods and largely shortens their fine-tuning time up to 1000 $\times$ . We begin with a vanilla distillation-based sampling method and boost its performance to state of the art by identifying and addressing several small yet vital factors affecting the…

Peer Reviews

Decision·NeurIPS 2024 poster

Reviewer 01Rating 6Confidence 4

Strengths

Paper is easy to read and follow. Show effectiveness on stable diffusion and much faster to train for large scale stable Diffusion too. It is interesting to see fixing model at some part improves model over all other parts.

Weaknesses

What is effect of SFD on diversity w.r.t distilled model? It might be easy to have high quality but much lower diversity. Proposed unrolling of student model ( global trajectory optimization) is effectively multi-step training like in structured prediction and imitation learning which has shown to be prone to mode-collapse. Justification of method is still unclear and also not sure on how sensitive is SFD to training time noise schedule/time step weighting i.e., forward diffusion process and it

Code & Models

Repositories

Videos

Simple and Fast Distillation of Diffusion Models· slideslive

Taxonomy

TopicsProcess Optimization and Integration · Advanced Control Systems Optimization · Field-Flow Fractionation Techniques

MethodsDiffusion