ScaleLong: Towards More Stable Training of Diffusion Model via Scaling Network Long Skip Connection
Zhongzhan Huang, Pan Zhou, Shuicheng Yan, Liang Lin

TL;DR
This paper provides a theoretical analysis of UNet's instability in diffusion models, showing how long skip connection coefficients affect training stability, and proposes ScaleLong to improve robustness and accelerate training.
Contribution
The paper offers a theoretical understanding of UNet instability in diffusion models and introduces ScaleLong, a coefficient scaling method that enhances training stability and speed.
Findings
Theoretical analysis links LSC coefficients to training oscillations.
Scaling LSC coefficients improves stability and robustness.
Experimental results show 1.5x faster training on multiple datasets.
Abstract
In diffusion models, UNet is the most popular network backbone, since its long skip connects (LSCs) to connect distant network blocks can aggregate long-distant information and alleviate vanishing gradient. Unfortunately, UNet often suffers from unstable training in diffusion models which can be alleviated by scaling its LSC coefficients smaller. However, theoretical understandings of the instability of UNet in diffusion models and also the performance improvement of LSC scaling remain absent yet. To solve this issue, we theoretically show that the coefficients of LSCs in UNet have big effects on the stableness of the forward and backward propagation and robustness of UNet. Specifically, the hidden feature and gradient of UNet at any layer can oscillate and their oscillation ranges are actually large which explains the instability of UNet training. Moreover, UNet is also provably…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Neuroimaging Techniques and Applications · Opinion Dynamics and Social Influence · Complex Network Analysis Techniques
MethodsDiffusion
