Elucidating the Design Space of Diffusion-Based Generative Models
Tero Karras, Miika Aittala, Timo Aila, Samuli Laine

TL;DR
This paper clarifies the design choices in diffusion-based generative models, leading to improved state-of-the-art results and faster sampling, by systematically analyzing and optimizing the model's components.
Contribution
It introduces a structured design space for diffusion models, enabling targeted improvements in sampling, training, and preconditioning, resulting in significant performance gains.
Findings
Achieved new state-of-the-art FID scores on CIFAR-10 and ImageNet-64.
Reduced sampling evaluations to 35 per image for faster generation.
Enhanced pre-trained score networks with substantial quality improvements.
Abstract
We argue that the theory and practice of diffusion-based generative models are currently unnecessarily convoluted and seek to remedy the situation by presenting a design space that clearly separates the concrete design choices. This lets us identify several changes to both the sampling and training processes, as well as preconditioning of the score networks. Together, our improvements yield new state-of-the-art FID of 1.79 for CIFAR-10 in a class-conditional setting and 1.97 in an unconditional setting, with much faster sampling (35 network evaluations per image) than prior designs. To further demonstrate their modular nature, we show that our design changes dramatically improve both the efficiency and quality obtainable with pre-trained score networks from previous work, including improving the FID of a previously trained ImageNet-64 model from 2.07 to near-SOTA 1.55, and after…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗playgroundai/playground-v2.5-1024px-aestheticmodel· 312k dl· ♡ 760312k dl♡ 760
- 🤗dg845/consistency-model-pipelinesmodel· 28 dl· ♡ 128 dl♡ 1
- 🤗dg845/diffusers-cd_bedroom256_l2model· 5 dl5 dl
- 🤗dg845/diffusers-cd_cat256_l2model· 2 dl2 dl
- 🤗dg845/diffusers-cd_imagenet64_lpipsmodel· 6 dl6 dl
- 🤗dg845/diffusers-cd_bedroom256_lpipsmodel· 1 dl1 dl
- 🤗dg845/diffusers-cd_cat256_lpipsmodel· 5 dl5 dl
- 🤗openai/diffusers-cd_cat256_lpipsmodel· 8 dl· ♡ 38 dl♡ 3
- 🤗openai/diffusers-cd_bedroom256_lpipsmodel· 24 dl· ♡ 324 dl♡ 3
- 🤗openai/diffusers-cd_cat256_l2model· 11 dl· ♡ 111 dl♡ 1
Videos
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Music and Audio Processing · Cell Image Analysis Techniques
