StableGrad: Backward Scale Control without Batch Normalization

Jose I. Mestre; Alberto Fern\'andez-Hern\'andez; Cristian P\'erez-Corral; Manuel F. Dolz; Enrique S. Quintana-Ort\'i

arXiv:2605.19856·cs.LG·May 20, 2026

StableGrad: Backward Scale Control without Batch Normalization

Jose I. Mestre, Alberto Fern\'andez-Hern\'andez, Cristian P\'erez-Corral, Manuel F. Dolz, Enrique S. Quintana-Ort\'i

PDF

TL;DR

StableGrad is an optimizer-based scale control method that stabilizes training of deep neural networks and PINNs without modifying the forward pass or using batch normalization.

Contribution

It introduces a novel optimizer-level scale correction mechanism that maintains stable training dynamics without altering the network architecture or forward computations.

Findings

01

StableGrad improves accuracy of deep PINNs without batch normalization.

02

It stabilizes training of ResNet and EfficientNet architectures without normalization layers.

03

The method enhances the reliability and depth of neural network training.

Abstract

Training very deep neural networks requires controlling the propagation of magnitudes across depth. Without such control, activations and gradients may vanish, explode, or enter unstable regimes that make optimization fail. Modern architectures often mitigate this problem through Batch Normalization, residual connections, or other normalization layers, which repeatedly re-scale or bypass intermediate representations. However, these mechanisms are not always appropriate. In Physics-Informed Neural Networks (PINNs), the network represents a continuous physical field and its input derivatives define the training objective, making batch-dependent normalization problematic because it can introduce non-local dependencies into the predicted field and its derivatives. We propose StableGrad, an optimizer-level scale-control mechanism that corrects layer-wise weight-gradient imbalances without…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.