How Diffusion Models Learn to Factorize and Compose
Qiyao Liang, Ziming Liu, Mitchell Ostrow, Ila Fiete

TL;DR
This paper investigates how diffusion models learn to represent and compose features, revealing they develop factorized representations that enhance compositionality but limit interpolation, with insights linked to percolation theory.
Contribution
The study provides a controlled experimental analysis of diffusion models' internal representations, showing their ability to learn factorized features and the conditions for compositional generalization.
Findings
Models learn factorized but not fully continuous representations.
Few examples suffice for models to attain compositionality.
Manifold formation relates to percolation theory, explaining sudden learning transitions.
Abstract
Diffusion models are capable of generating photo-realistic images that combine elements which likely do not appear together in the training set, demonstrating the ability to \textit{compositionally generalize}. Nonetheless, the precise mechanism of compositionality and how it is acquired through training remains elusive. Inspired by cognitive neuroscientific approaches, we consider a highly reduced setting to examine whether and when diffusion models learn semantically meaningful and factorized representations of composable features. We performed extensive controlled experiments on conditional Denoising Diffusion Probabilistic Models (DDPMs) trained to generate various forms of 2D Gaussian bump images. We found that the models learn factorized but not fully continuous manifold representations for encoding continuous features of variation underlying the data. With such representations,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Machine Learning in Materials Science · Face Recognition and Perception
MethodsDiffusion
