SynerDiff: Synergetic Continuous Batching for Fast and Parallel Diffusion Model Inference

Ziqi Zhou; Peng Yang; Yuxin Liang; Mingliu Liu; Jia Lu

arXiv:2605.08835·cs.AI·May 12, 2026

SynerDiff: Synergetic Continuous Batching for Fast and Parallel Diffusion Model Inference

Ziqi Zhou, Peng Yang, Yuxin Liang, Mingliu Liu, Jia Lu

PDF

TL;DR

SynerDiff is a novel batching system that enhances diffusion model inference by reducing latency and increasing throughput through intra- and inter-concurrency optimizations.

Contribution

It introduces a synergy-based batching approach with adaptive scheduling and feedback control to optimize resource utilization and latency in diffusion model serving.

Findings

01

Increases throughput by 1.6 times compared to benchmarks.

02

Reduces average E2E and P99 tail latencies by up to 78.7%.

03

Maintains high image fidelity during optimization.

Abstract

The expansion of Artificial Intelligence-generated content service requires diffusion model serving to simultaneously achieve high throughput and low task end-to-end (E2E) latency. However, existing continuous batching methods suffer from severe resource contention during UNet-VAE concurrency, leading to latency spikes. Furthermore, concurrent multi-task scheduling entails a trade-off between UNet throughput and VAE latency across varying scheduling strategies. To address these, we propose SynerDiff, an efficient continuous batching system built on intra-inter level synergy. At the intra-concurrency level, SynerDiff alleviates resource contention by pruning component-specific resource bottlenecks via VAE Chunking and Adaptive Skip-CFG. At the inter-concurrency level, leveraging components' differential sensitivity to scheduling granularities, a threshold-aware scheduler plans concurrent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.