PATCHEDSERVE: A Patch Management Framework for SLO-Optimized Hybrid Resolution Diffusion Serving
Desen Sun, Zepeng Zhao, Yuke Wang

TL;DR
PatchedServe is a novel framework that improves the efficiency and responsiveness of diffusion model serving by using patch-based processing, patch-level caching, and SLO-aware scheduling for heterogeneous resolutions.
Contribution
It introduces the first SLO-optimized diffusion serving framework that handles multi-resolution inputs with patch-based processing and cache policies.
Findings
Achieves 30.1% higher SLO satisfaction over state-of-the-art systems.
Improves throughput for hybrid-resolution diffusion inputs.
Maintains image quality while optimizing responsiveness.
Abstract
The Text-to-Image (T2I) diffusion model has emerged as one of the most widely adopted generative models. However, serving diffusion models at the granularity of entire images introduces significant challenges, particularly under multi-resolution workloads. First, image-level serving obstructs batching across requests. Second, heterogeneous resolutions exhibit distinct locality characteristics, making it difficult to apply a uniform cache policy effectively. To address these challenges, we present PatchedServe, a Patch Management Framework for SLO-Optimized Hybrid-Resolution Diffusion Serving. PatchedServe is the first SLO-optimized T2I diffusion serving framework designed to handle heterogeneous resolutions. Specifically, it incorporates a novel patch-based processing workflow that substantially improves throughput for hybrid-resolution inputs. Moreover, PatchedServe devises a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIterative Learning Control Systems · Engineering and Test Systems · VLSI and Analog Circuit Testing
