Communication-Efficient Diffusion Denoising Parallelization via Reuse-then-Predict Mechanism
Kunyun Wang, Bohan Li, Kai Yu, Minyi Guo, Jieru Zhao

TL;DR
This paper introduces ParaStep, a communication-efficient parallelization method for diffusion models that significantly reduces inference latency by exploiting similarity between denoising steps, enabling faster generation without quality loss.
Contribution
ParaStep is a novel parallelization approach that uses a reuse-then-predict mechanism with lightweight communication to accelerate diffusion inference.
Findings
Achieves up to 6.56x speedup on AudioLDM2-large.
Reduces communication overhead compared to prior methods.
Maintains generation quality while significantly speeding up inference.
Abstract
Diffusion models have emerged as a powerful class of generative models across various modalities, including image, video, and audio synthesis. However, their deployment is often limited by significant inference latency, primarily due to the inherently sequential nature of the denoising process. While existing parallelization strategies attempt to accelerate inference by distributing computation across multiple devices, they typically incur high communication overhead, hindering deployment on commercial hardware. To address this challenge, we propose \textbf{ParaStep}, a novel parallelization method based on a reuse-then-predict mechanism that parallelizes diffusion inference by exploiting similarity between adjacent denoising steps. Unlike prior approaches that rely on layer-wise or stage-wise communication, ParaStep employs lightweight, step-wise communication, substantially reducing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Computing and Algorithms · Image and Signal Denoising Methods · Neural Networks and Applications
MethodsDiffusion
