Diffusion Models, Denoiser Architecture and Creativity
Itamar Levine, Yair Weiss

TL;DR
This paper investigates how the architecture of denoisers in diffusion models influences their ability to generate creative and realistic images, combining empirical experiments with theoretical analysis.
Contribution
It provides explicit theoretical formulas for sample distributions based on denoiser architecture and demonstrates how small architectural changes affect creativity and realism.
Findings
Small architectural changes in denoisers drastically alter generated sample diversity.
The success of diffusion models depends on the alignment between denoiser architecture and the target distribution.
Explicit distribution formulas are derived for different denoiser architectures.
Abstract
The creativity of diffusion models refers to their ability to generate highly realistic images that are different from their training data. Creativity is somewhat surprising since it is known that if the denoiser used in the diffusion model is the Bayes optimal denoiser for a given training set, then the model will simply copy the training samples. In this paper we present empirical and theoretical results that suggest that creativity in diffusion models is due to an interaction between the denoiser architecture and the target distribution. Theoretically, we give explicit forms for the distribution of generated samples as a function of the target distribution and the denoiser architecture for three different denoiser architectures (linear, polynomial, bottleneck). Empirically, we show that small changes in the popular UNET denoiser architecture leads to very different forms of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
