Origins of Creativity in Attention-Based Diffusion Models
Emma Finn, T. Anderson Keller, Manos Theodosis, Demba E. Ba

TL;DR
This paper explores how self-attention mechanisms in diffusion models influence the emergence of globally coherent and creative image generation, extending existing theories from CNN-based models to those with self-attention.
Contribution
It extends the theoretical understanding of diffusion models by analyzing the role of self-attention in producing globally consistent and creative images.
Findings
Self-attention induces globally image-consistent arrangements of features.
Empirical verification on a crafted dataset supports the theory.
Diffusion models with self-attention generate more coherent images than CNN-only models.
Abstract
As diffusion models have become the tool of choice for image generation and as the quality of the images continues to improve, the question of how `creativity' originates in diffusion has become increasingly important. The score matching perspective on diffusion has proven particularly fruitful for understanding how and why diffusion models generate images that remain plausible while differing significantly from their training images. In particular, as explained in (Kamb \& Ganguli, 2024) and others, e.g., (Ambrogioni, 2023), theory suggests that if our score matching were optimal, we would only be able to recover training samples through our diffusion process. However, as shown by Kamb \& Ganguli, (2024), in diffusion models where the score is parametrized by a simple CNN, the inductive biases of the CNN itself (translation equivariance and locality) allow the model to generate samples…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
