Communication-Efficient Serving for Video Diffusion Models with Latent Parallelism
Zhiyuan Wu, Shuai Wang, Li Chen, Kaihui Gao, Dan Li, Yanyu Ren, Qiming Zhang, Yong Wang

TL;DR
This paper introduces Latent Parallelism, a novel parallelism strategy for video diffusion models that significantly reduces communication overhead while maintaining quality, enabling scalable video generation across multiple GPUs.
Contribution
The paper proposes Latent Parallelism, a new parallelism method tailored for VDM serving that exploits local spatio-temporal dependencies to reduce communication costs.
Findings
LP reduces communication overhead by up to 97% compared to baseline methods.
LP maintains comparable generation quality to existing strategies.
LP can be seamlessly integrated with existing parallelism techniques.
Abstract
Video diffusion models (VDMs) perform attention computation over the 3D spatio-temporal domain. Compared to large language models (LLMs) processing 1D sequences, their memory consumption scales cubically, necessitating parallel serving across multiple GPUs. Traditional parallelism strategies partition the computational graph, requiring frequent high-dimensional activation transfers that create severe communication bottlenecks. To tackle this issue, we exploit the local spatio-temporal dependencies inherent in the diffusion denoising process and propose Latent Parallelism (LP), the first parallelism strategy tailored for VDM serving. \textcolor{black}{LP decomposes the global denoising problem into parallelizable sub-problems by dynamically rotating the partitioning dimensions (temporal, height, and width) within the compact latent space across diffusion timesteps, substantially reducing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Face recognition and analysis
