Dynamical Regimes of Multimodal Diffusion Models
Emil Albrychiewicz, Andr\'es Franco Valiente, and Li-Ching Chen

TL;DR
This paper develops a theoretical framework for multimodal diffusion models, revealing how spectral interaction timescales influence generation and explaining desynchronization artifacts, supported by experiments on MNIST.
Contribution
It introduces a spectral hierarchy model for coupled diffusion processes, predicting synchronization gaps and providing analytical bounds for stable multimodal generation.
Findings
Spectral hierarchy governs multimodal diffusion dynamics.
Synchronization gap explains desynchronization artifacts.
Coupling strength acts as a spectral filter for temporal control.
Abstract
Diffusion based generative models have achieved unprecedented fidelity in synthesizing high dimensional data, yet the theoretical mechanisms governing multimodal generation remain poorly understood. Here, we present a theoretical framework for coupled diffusion models, using coupled Ornstein-Uhlenbeck processes as a tractable model. By using the nonequilibrium statistical physics of dynamical phase transitions, we demonstrate that multimodal generation is governed by a spectral hierarchy of interaction timescales rather than simultaneous resolution. A key prediction is the ``synchronization gap'', a temporal window during the reverse generative process where distinct eigenmodes stabilize at different rates, providing a theoretical explanation for common desynchronization artifacts. We derive analytical conditions for speciation and collapse times under both symmetric and anisotropic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsQuantum many-body systems · Generative Adversarial Networks and Image Synthesis · Language and cultural evolution
