SwiftDiffusion: Efficient Diffusion Model Serving with Add-on Modules
Suyi Li, Lingyun Yang, Xiaoxiao Jiang, Hanfeng Lu, Dakai An, Zhipeng, Di, Weiyi Lu, Jiawei Chen, Kan Liu, Yinghao Yu, Tao Lan, Guodong Yang, Lin, Qu, Liping Zhang, Wei Wang

TL;DR
SwiftDiffusion introduces a system that significantly reduces latency and improves throughput in diffusion model-based text-to-image generation by decoupling modules, caching, parallelizing, and asynchronously loading components.
Contribution
It proposes a holistic system design that decouples ControlNet, employs bounded asynchronous LoRA loading, and introduces latent parallelism for efficient diffusion model serving.
Findings
Achieves up to 7.8x latency reduction.
Realizes 1.6x throughput improvement.
Maintains image quality while optimizing performance.
Abstract
Text-to-image (T2I) generation using diffusion models has become a blockbuster service in today's AI cloud. A production T2I service typically involves a serving workflow where a base diffusion model is augmented with various "add-on" modules, notably ControlNet and LoRA, to enhance image generation control. Compared to serving the base model alone, these add-on modules introduce significant loading and computational overhead, resulting in increased latency. In this paper, we present SwiftDiffusion, a system that efficiently serves a T2I workflow through a holistic approach. SwiftDiffusion decouples ControNet from the base model and deploys it as a separate, independently scaled service on dedicated GPUs, enabling ControlNet caching, parallelization, and sharing. To mitigate the high loading overhead of LoRA serving, SwiftDiffusion employs a bounded asynchronous LoRA loading (BAL)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks
Methodstravel james · Activation Patching · Diffusion · Balanced Selection
