Partially Conditioned Patch Parallelism for Accelerated Diffusion Model Inference
XiuYu Zhang, Zening Luo, Michelle E. Lu

TL;DR
This paper introduces Partially Conditioned Patch Parallelism (PCPP), a novel method that accelerates high-resolution diffusion model inference by reducing communication and computation, enabling faster image generation with minimal quality loss.
Contribution
The paper proposes PCPP, a new parallelism technique that leverages partial conditioning to significantly speed up diffusion model inference while reducing communication overhead.
Findings
Achieves 2.36 to 8.02 times speed-up with 4-8 GPUs.
Reduces communication cost by around 70% compared to state-of-the-art.
Maintains high image quality with faster inference.
Abstract
Diffusion models have exhibited exciting capabilities in generating images and are also very promising for video creation. However, the inference speed of diffusion models is limited by the slow sampling process, restricting its use cases. The sequential denoising steps required for generating a single sample could take tens or hundreds of iterations and thus have become a significant bottleneck. This limitation is more salient for applications that are interactive in nature or require small latency. To address this challenge, we propose Partially Conditioned Patch Parallelism (PCPP) to accelerate the inference of high-resolution diffusion models. Using the fact that the difference between the images in adjacent diffusion steps is nearly zero, Patch Parallelism (PP) leverages multiple GPUs communicating asynchronously to compute patches of an image in multiple computing devices based on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Neural Networks and Applications · NMR spectroscopy and applications
MethodsDiffusion · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
