Analysis on Gradient Propagation in Batch Normalized Residual Networks
Abhishek Panigrahi, Yueru Chen, C.-C. Jay Kuo

TL;DR
This paper provides a mathematical analysis of how batch normalization influences gradient propagation in residual networks, demonstrating its role in preventing gradient vanishing or explosion during training.
Contribution
It offers a theoretical understanding of BN's effect on gradient variance in residual networks, highlighting its importance in stable training.
Findings
BN confines gradient variance across residual blocks
Prevents gradient vanishing/explosion in residual networks
Shows the relative importance of BN in residual branches
Abstract
We conduct mathematical analysis on the effect of batch normalization (BN) on gradient backpropogation in residual network training, which is believed to play a critical role in addressing the gradient vanishing/explosion problem, in this work. By analyzing the mean and variance behavior of the input and the gradient in the forward and backward passes through the BN and residual branches, respectively, we show that they work together to confine the gradient variance to a certain range across residual blocks in backpropagation. As a result, the gradient vanishing/explosion problem is avoided. We also show the relative importance of batch normalization w.r.t. the residual branches in residual networks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGeophysical Methods and Applications · Ultrasonics and Acoustic Wave Propagation · Rock Mechanics and Modeling
MethodsBatch Normalization
