AdaptiveLoad: Towards Efficient Video Diffusion Transformer Training

Yucheng Guo; Yongjian Guo; Zhong Guan; Haoran Sun; Wen Huang; Wanting Xu; Jing Long; Shuai Di; Junwu Xiong

arXiv:2605.17923·cs.DC·May 19, 2026

AdaptiveLoad: Towards Efficient Video Diffusion Transformer Training

Yucheng Guo, Yongjian Guo, Zhong Guan, Haoran Sun, Wen Huang, Wanting Xu, Jing Long, Shuai Di, Junwu Xiong

PDF

TL;DR

AdaptiveLoad introduces a dual-constraint load balancing framework and a specialized CUDA kernel to optimize training efficiency of large-scale video diffusion Transformers, addressing computational load imbalance and memory bottlenecks.

Contribution

It presents a novel adaptive load balancing system and a fused CUDA kernel to improve GPU utilization and training throughput for video diffusion models.

Findings

01

Reduced computational imbalance rate from 39% to 18.9%

02

Improved peak VRAM utilization efficiency by 22.7%

03

Achieved 27.2% increase in training throughput

Abstract

In video generation models, particularly world models, training large-scale video diffusion Transformers (such as DiT and MMDiT) poses significant computational challenges due to the extreme variance in sequence lengths within mixed-mode datasets. Existing bucket-based data loading strategies typically rely on "equal token length" constraints. This approach fails to account for the quadratic complexity of self-attention mechanisms, leading to severe load imbalance and underutilization of GPU resources. This paper proposes \textit{AdaptiveLoad}, an integrated optimization framework consisting of two core components: (1) A dual-constraint adaptive load balancing system, which eliminates long-sequence bottlenecks by simultaneously limiting memory consumption and computational load ( $B \times S^{p} \leq M_{comp}$ ); (2) A fused LayerNorm-Modulate CUDA kernel, which utilizes a D-tile…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.