DDiT: Dynamic Resource Allocation for Diffusion Transformer Model Serving

Heyang Huang; Cunchen Hu; Jiaqi Zhu; Ziyuan Gao; Liangliang Xu; Yizhou Shan; Yungang Bao; Sun Ninghui; Tianwei Zhang; Sa Wang

arXiv:2506.13497·cs.DC·June 17, 2025

DDiT: Dynamic Resource Allocation for Diffusion Transformer Model Serving

Heyang Huang, Cunchen Hu, Jiaqi Zhu, Ziyuan Gao, Liangliang Xu, Yizhou Shan, Yungang Bao, Sun Ninghui, Tianwei Zhang, Sa Wang

PDF

Open Access

TL;DR

DDiT is a system that optimizes resource allocation for diffusion transformer model serving, significantly improving latency and efficiency by dynamically balancing parallelism and resource use across model modules.

Contribution

The paper introduces DDiT, a novel resource management system with decoupled control and dynamic scheduling for diffusion transformer models, addressing inefficiencies in existing deployment methods.

Findings

01

Up to 1.44x reduction in p99 latency

02

Up to 1.43x reduction in average latency

03

Effective dynamic resource scaling across diverse datasets

Abstract

The Text-to-Video (T2V) model aims to generate dynamic and expressive videos from textual prompts. The generation pipeline typically involves multiple modules, such as language encoder, Diffusion Transformer (DiT), and Variational Autoencoders (VAE). Existing serving systems often rely on monolithic model deployment, while overlooking the distinct characteristics of each module, leading to inefficient GPU utilization. In addition, DiT exhibits varying performance gains across different resolutions and degrees of parallelism, and significant optimization potential remains unexplored. To address these problems, we present DDiT, a flexible system that integrates both inter-phase and intra-phase optimizations. DDiT focuses on two key metrics: optimal degree of parallelism, which prevents excessive parallelism for specific resolutions, and starvation time, which quantifies the sacrifice of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications