Decentralized Diffusion Models
David McAllister, Matthew Tancik, Jiaming Song, Angjoo Kanazawa

TL;DR
Decentralized Diffusion Models enable scalable, cost-effective training of diffusion models across independent clusters without centralized networking, maintaining performance and reducing infrastructure costs.
Contribution
We introduce a decentralized training framework for diffusion models that eliminates the need for centralized high-bandwidth networks, allowing training across independent clusters with ensemble inference.
Findings
Decentralized diffusion models outperform standard models on ImageNet and LAION datasets.
The approach scales to 24 billion parameters using only eight GPU nodes.
Training time is reduced to less than a week for large models.
Abstract
Large-scale AI model training divides work across thousands of GPUs, then synchronizes gradients across them at each step. This incurs a significant network burden that only centralized, monolithic clusters can support, driving up infrastructure costs and straining power systems. We propose Decentralized Diffusion Models, a scalable framework for distributing diffusion model training across independent clusters or datacenters by eliminating the dependence on a centralized, high-bandwidth networking fabric. Our method trains a set of expert diffusion models over partitions of the dataset, each in full isolation from one another. At inference time, the experts ensemble through a lightweight router. We show that the ensemble collectively optimizes the same objective as a single model trained over the whole dataset. This means we can divide the training burden among a number of "compute…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDifferential Equations and Numerical Methods
MethodsSparse Evolutionary Training · Diffusion
