FastUSP: A Multi-Level Collaborative Acceleration Framework for Distributed Diffusion Model Inference
Guandong Li

TL;DR
FastUSP introduces a multi-level optimization framework that significantly accelerates distributed diffusion model inference on multi-GPU systems by reducing kernel launch overhead and improving computation-communication efficiency.
Contribution
It presents a novel multi-level optimization framework combining compile, communication, and operator-level techniques to enhance USP-based distributed diffusion inference.
Findings
Achieves 1.12x–1.16x speedup on FLUX with 12B parameters.
Attains 1.09x speedup on Qwen-Image with 2 GPUs.
Kernel launch overhead is identified as the main bottleneck in modern GPU interconnects.
Abstract
Large-scale diffusion models such as FLUX (12B parameters) and Stable Diffusion 3 (8B parameters) require multi-GPU parallelism for efficient inference. Unified Sequence Parallelism (USP), which combines Ulysses and Ring attention mechanisms, has emerged as the state-of-the-art approach for distributed attention computation. However, existing USP implementations suffer from significant inefficiencies including excessive kernel launch overhead and suboptimal computation-communication scheduling. In this paper, we propose \textbf{FastUSP}, a multi-level optimization framework that integrates compile-level optimization (graph compilation with CUDA Graphs and computation-communication reordering), communication-level optimization (FP8 quantized collective communication), and operator-level optimization (pipelined Ring attention with double buffering). We evaluate FastUSP on FLUX (12B) and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Stochastic Gradient Optimization Techniques · Advanced Neuroimaging Techniques and Applications
