Towards Precise Scaling Laws for Video Diffusion Transformers

Yuanyang Yin; Yaqi Zhao; Mingwu Zheng; Ke Lin; Jiarong Ou; Rui Chen,; Victor Shea-Jay Huang; Jiahao Wang; Xin Tao; Pengfei Wan; Di Zhang; Baoqun; Yin; Wentao Zhang; Kun Gai

arXiv:2411.17470·cs.CV·January 3, 2025

Towards Precise Scaling Laws for Video Diffusion Transformers

Yuanyang Yin, Yaqi Zhao, Mingwu Zheng, Ke Lin, Jiarong Ou, Rui Chen,, Victor Shea-Jay Huang, Jiahao Wang, Xin Tao, Pengfei Wan, Di Zhang, Baoqun, Yin, Wentao Zhang, Kun Gai

PDF

Open Access

TL;DR

This paper systematically analyzes and confirms the existence of scaling laws for video diffusion transformers, proposing a new law to optimize hyperparameters and improve performance-cost trade-offs.

Contribution

It introduces a novel scaling law for video diffusion transformers that predicts optimal hyperparameters and enhances performance within compute constraints.

Findings

01

Confirmed the presence of scaling laws in video diffusion models

02

Discovered sensitivity to learning rate and batch size

03

Reduced inference costs by 40.1% under the same compute budget

Abstract

Achieving optimal performance of video diffusion transformers within given data and compute budget is crucial due to their high training costs. This necessitates precisely determining the optimal model size and training hyperparameters before large-scale training. While scaling laws are employed in language models to predict performance, their existence and accurate derivation in visual generation models remain underexplored. In this paper, we systematically analyze scaling laws for video diffusion transformers and confirm their presence. Moreover, we discover that, unlike language models, video diffusion models are more sensitive to learning rate and batch size, two hyperparameters often not precisely modeled. To address this, we propose a new scaling law that predicts optimal hyperparameters for any model size and compute budget. Under these optimal settings, we achieve comparable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputer Graphics and Visualization Techniques · Generative Adversarial Networks and Image Synthesis · Advanced Optical Imaging Technologies

MethodsDiffusion