An analytic theory of creativity in convolutional diffusion models
Mason Kamb, Surya Ganguli

TL;DR
This paper develops an analytic, interpretable theory explaining how convolutional diffusion models generate highly original images through local and equivariant biases, revealing a patch mosaic mechanism of creativity.
Contribution
It introduces a mechanistic local score model that predicts diffusion model outputs and uncovers a patch mosaic process underlying creativity in image generation.
Findings
High predictive accuracy of the model on various datasets (median r^2 > 0.94)
Reveals a local patch mosaic mechanism for creativity in diffusion models
Partially predicts outputs of attention-enabled UNets, suggesting attention's role in semantic coherence.
Abstract
We obtain an analytic, interpretable and predictive theory of creativity in convolutional diffusion models. Indeed, score-matching diffusion models can generate highly original images that lie far from their training data. However, optimal score-matching theory suggests that these models should only be able to produce memorized training examples. To reconcile this theory-experiment gap, we identify two simple inductive biases, locality and equivariance, that: (1) induce a form of combinatorial creativity by preventing optimal score-matching; (2) result in fully analytic, completely mechanistically interpretable, local score (LS) and equivariant local score (ELS) machines that, (3) after calibrating a single time-dependent hyperparameter can quantitatively predict the outputs of trained convolution only diffusion models (like ResNets and UNets) with high accuracy (median of $0.95,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsCreativity in Education and Neuroscience
MethodsSoftmax · Attention Is All You Need · Sparse Evolutionary Training · Diffusion · Convolution
