Towards Precise Scaling Laws for Video Diffusion Transformers
Yuanyang Yin, Yaqi Zhao, Mingwu Zheng, Ke Lin, Jiarong Ou, Rui Chen,, Victor Shea-Jay Huang, Jiahao Wang, Xin Tao, Pengfei Wan, Di Zhang, Baoqun, Yin, Wentao Zhang, Kun Gai

TL;DR
This paper systematically analyzes and confirms the existence of scaling laws for video diffusion transformers, proposing a new law to optimize hyperparameters and improve performance-cost trade-offs.
Contribution
It introduces a novel scaling law for video diffusion transformers that predicts optimal hyperparameters and enhances performance within compute constraints.
Findings
Confirmed the presence of scaling laws in video diffusion models
Discovered sensitivity to learning rate and batch size
Reduced inference costs by 40.1% under the same compute budget
Abstract
Achieving optimal performance of video diffusion transformers within given data and compute budget is crucial due to their high training costs. This necessitates precisely determining the optimal model size and training hyperparameters before large-scale training. While scaling laws are employed in language models to predict performance, their existence and accurate derivation in visual generation models remain underexplored. In this paper, we systematically analyze scaling laws for video diffusion transformers and confirm their presence. Moreover, we discover that, unlike language models, video diffusion models are more sensitive to learning rate and batch size, two hyperparameters often not precisely modeled. To address this, we propose a new scaling law that predicts optimal hyperparameters for any model size and compute budget. Under these optimal settings, we achieve comparable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputer Graphics and Visualization Techniques · Generative Adversarial Networks and Image Synthesis · Advanced Optical Imaging Technologies
MethodsDiffusion
