StableGrad: Backward Scale Control without Batch Normalization
Jose I. Mestre, Alberto Fern\'andez-Hern\'andez, Cristian P\'erez-Corral, Manuel F. Dolz, Enrique S. Quintana-Ort\'i

TL;DR
StableGrad is an optimizer-based scale control method that stabilizes training of deep neural networks and PINNs without modifying the forward pass or using batch normalization.
Contribution
It introduces a novel optimizer-level scale correction mechanism that maintains stable training dynamics without altering the network architecture or forward computations.
Findings
StableGrad improves accuracy of deep PINNs without batch normalization.
It stabilizes training of ResNet and EfficientNet architectures without normalization layers.
The method enhances the reliability and depth of neural network training.
Abstract
Training very deep neural networks requires controlling the propagation of magnitudes across depth. Without such control, activations and gradients may vanish, explode, or enter unstable regimes that make optimization fail. Modern architectures often mitigate this problem through Batch Normalization, residual connections, or other normalization layers, which repeatedly re-scale or bypass intermediate representations. However, these mechanisms are not always appropriate. In Physics-Informed Neural Networks (PINNs), the network represents a continuous physical field and its input derivatives define the training objective, making batch-dependent normalization problematic because it can introduce non-local dependencies into the predicted field and its derivatives. We propose StableGrad, an optimizer-level scale-control mechanism that corrects layer-wise weight-gradient imbalances without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
