LegoDiffusion: Micro-Serving Text-to-Image Diffusion Workflows

Lingyun Yang; Suyi Li; Tianyu Feng; Xiaoxiao Jiang; Zhipeng Di; Weiyi Lu; Kan Liu; Yinghao Yu; Tao Lan; Guodong Yang; Lin Qu; Liping Zhang; Wei Wang

arXiv:2604.08123·cs.DC·April 10, 2026

LegoDiffusion: Micro-Serving Text-to-Image Diffusion Workflows

Lingyun Yang, Suyi Li, Tianyu Feng, Xiaoxiao Jiang, Zhipeng Di, Weiyi Lu, Kan Liu, Yinghao Yu, Tao Lan, Guodong Yang, Lin Qu, Liping Zhang, Wei Wang

PDF

TL;DR

LegoDiffusion introduces a micro-serving system for text-to-image diffusion workflows, enabling independent model management and significantly improving request handling and burst traffic tolerance.

Contribution

It decomposes diffusion workflows into loosely coupled nodes, allowing optimized resource management, model sharing, and adaptive parallelism, surpassing existing systems.

Findings

01

Up to 3x higher request rates compared to existing systems.

02

Supports up to 8x higher burst traffic.

03

Enables per-model scaling and sharing.

Abstract

Text-to-image generation executes a diffusion workflow comprising multiple models centered on a base diffusion model. Existing serving systems treat each workflow as an opaque monolith, provisioning, placing, and scaling all constituent models together, which obscures internal dataflow, prevents model sharing, and enforces coarse-grained resource management. In this paper, we make a case for micro-serving diffusion workflows with LegoDiffusion, a system that decomposes a workflow into loosely coupled model-execution nodes that can be independently managed and scheduled. By explicitly managing individual model inference, LegoDiffusion unlocks cluster-scale optimizations, including per-model scaling, model sharing, and adaptive model parallelism. Collectively, LegoDiffusion outperforms existing diffusion workflow serving systems, sustaining up to 3x higher request rates and tolerating up…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.