TL;DR
This paper introduces a method to estimate the optimal loss value in diffusion models, enabling better diagnosis of training quality and improved training schedules, with implications for understanding model scaling laws.
Contribution
It derives a closed-form optimal loss for diffusion models, develops scalable estimators, and demonstrates their use in diagnosing training and exploring scaling laws.
Findings
Optimal loss estimation improves training diagnostics.
A new training schedule based on optimal loss enhances performance.
Subtracting optimal loss reveals better scaling law patterns.
Abstract
Diffusion models have achieved remarkable success in generative modeling. Despite more stable training, the loss of diffusion models is not indicative of absolute data-fitting quality, since its optimal value is typically not zero but unknown, leading to confusion between large optimal loss and insufficient model capacity. In this work, we advocate the need to estimate the optimal loss value for diagnosing and improving diffusion models. We first derive the optimal loss in closed form under a unified formulation of diffusion models, and develop effective estimators for it, including a stochastic variant scalable to large datasets with proper control of variance and bias. With this tool, we unlock the inherent metric for diagnosing the training quality of mainstream diffusion model variants, and develop a more performant training schedule based on the optimal loss. Moreover, using models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
