SynerDiff: Synergetic Continuous Batching for Fast and Parallel Diffusion Model Inference
Ziqi Zhou, Peng Yang, Yuxin Liang, Mingliu Liu, Jia Lu

TL;DR
SynerDiff is a novel batching system that enhances diffusion model inference by reducing latency and increasing throughput through intra- and inter-concurrency optimizations.
Contribution
It introduces a synergy-based batching approach with adaptive scheduling and feedback control to optimize resource utilization and latency in diffusion model serving.
Findings
Increases throughput by 1.6 times compared to benchmarks.
Reduces average E2E and P99 tail latencies by up to 78.7%.
Maintains high image fidelity during optimization.
Abstract
The expansion of Artificial Intelligence-generated content service requires diffusion model serving to simultaneously achieve high throughput and low task end-to-end (E2E) latency. However, existing continuous batching methods suffer from severe resource contention during UNet-VAE concurrency, leading to latency spikes. Furthermore, concurrent multi-task scheduling entails a trade-off between UNet throughput and VAE latency across varying scheduling strategies. To address these, we propose SynerDiff, an efficient continuous batching system built on intra-inter level synergy. At the intra-concurrency level, SynerDiff alleviates resource contention by pruning component-specific resource bottlenecks via VAE Chunking and Adaptive Skip-CFG. At the inter-concurrency level, leveraging components' differential sensitivity to scheduling granularities, a threshold-aware scheduler plans concurrent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
