Communication-Efficient Diffusion Denoising Parallelization via Reuse-then-Predict Mechanism

Kunyun Wang; Bohan Li; Kai Yu; Minyi Guo; Jieru Zhao

arXiv:2505.14741·cs.LG·October 14, 2025

Communication-Efficient Diffusion Denoising Parallelization via Reuse-then-Predict Mechanism

Kunyun Wang, Bohan Li, Kai Yu, Minyi Guo, Jieru Zhao

PDF

Open Access 1 Video

TL;DR

This paper introduces ParaStep, a communication-efficient parallelization method for diffusion models that significantly reduces inference latency by exploiting similarity between denoising steps, enabling faster generation without quality loss.

Contribution

ParaStep is a novel parallelization approach that uses a reuse-then-predict mechanism with lightweight communication to accelerate diffusion inference.

Findings

01

Achieves up to 6.56x speedup on AudioLDM2-large.

02

Reduces communication overhead compared to prior methods.

03

Maintains generation quality while significantly speeding up inference.

Abstract

Diffusion models have emerged as a powerful class of generative models across various modalities, including image, video, and audio synthesis. However, their deployment is often limited by significant inference latency, primarily due to the inherently sequential nature of the denoising process. While existing parallelization strategies attempt to accelerate inference by distributing computation across multiple devices, they typically incur high communication overhead, hindering deployment on commercial hardware. To address this challenge, we propose \textbf{ParaStep}, a novel parallelization method based on a reuse-then-predict mechanism that parallelizes diffusion inference by exploiting similarity between adjacent denoising steps. Unlike prior approaches that rely on layer-wise or stage-wise communication, ParaStep employs lightweight, step-wise communication, substantially reducing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Communication-Efficient Diffusion Denoising Parallelization via Reuse-then-Predict Mechanism· slideslive

Taxonomy

TopicsAdvanced Computing and Algorithms · Image and Signal Denoising Methods · Neural Networks and Applications

MethodsDiffusion