Scaling Laws For Diffusion Transformers
Zhengyang Liang, Hao He, Ceyuan Yang, Bo Dai

TL;DR
This paper establishes and validates scaling laws for diffusion transformers, enabling precise predictions of model performance, optimal size, and data needs across different compute budgets, thereby improving efficiency in content generation tasks.
Contribution
First to empirically confirm power-law scaling laws in diffusion transformers, linking compute, model size, and performance for better predictability.
Findings
Pretraining loss follows a power-law with compute.
Scaling laws enable accurate prediction of generation quality.
Trend of loss matches actual generation performance across datasets.
Abstract
Diffusion transformers (DiT) have already achieved appealing synthesis and scaling properties in content recreation, e.g., image and video generation. However, scaling laws of DiT are less explored, which usually offer precise predictions regarding optimal model size and data requirements given a specific compute budget. Therefore, experiments across a broad range of compute budgets, from 1e17 to 6e18 FLOPs are conducted to confirm the existence of scaling laws in DiT for the first time. Concretely, the loss of pretraining DiT also follows a power-law relationship with the involved compute. Based on the scaling law, we can not only determine the optimal model size and required data but also accurately predict the text-to-image generation loss given a model with 1B parameters and a compute budget of 1e21 FLOPs. Additionally, we also demonstrate that the trend of pre-training loss matches…
Peer Reviews
Decision·ICLR 2026 Poster
Paper is very well-written and easy to follow. The method section has been particularly well-described. 1. Authors explore a very timely topic in diffusion based transformer models. 2. Authors conduct experiments with DiT across various compute budgets and provide empirical proofs. 3. The paper also reports a correlation between pretraining loss and downstream FID, GenEval and human preference metric which can be potentially beneficial for practitioners.
1. This paper fixes a particular training dataset derived from Laion-5B. However, with generative models, we are observing that a careful data curation pipeline can affect the training and model scaling options. Authors can consider also studying how noisy / clean data can affect scaling properties. 2. The power-law relationship between training budget and generation performance provides a sign that the scaling law can predict generation performance. However, it is a bit unknown if this law w
- This paper addresses an important problem: establishing scaling laws for text-to-image generation with diffusion models. The findings have the potential to offer valuable insights to the community. - The authors conduct extensive experiments and dedicate substantial effort to derive and validate the proposed scaling laws.
- In Figure 1, the assumption underlying the use of parabolic fitting for the performance curve is not clearly stated. If the curve is assumed to be unimodal, then a ternary search strategy could directly identify the optimal loss without requiring curve fitting. - In Figure 20 (GenEval results), only the value "10" appears on the y-axis, making it difficult to determine the specific GenEval scores associated with each data point. Providing a complete y-axis scale would significantly improve rea
* The paper is very well written. * The findings enable practitioners to tune the hyperparameters of DiTs more efficiently. * The paper demonstrates scaling laws not only with respect to the loss, but also for other useful metrics such as FID and human preference reward.
Beyond text-to-image generation, diffusion models have also been applied to tasks such as class-conditioned image generation and text-to-video generation. However, the paper only experiments on text-to-image generation, so it remains unclear whether the same scaling laws extend to these other tasks.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTheoretical and Computational Physics
