Delving into Variance Transmission and Normalization: Shift of Average Gradient Makes the Network Collapse
Yuxiang Liu, Jidong Ge, Chuanyi Li, and Jie Gui

TL;DR
This paper investigates how Batch Normalization affects variance transmission in neural networks, identifies issues with average gradient shifts, and proposes Parametric Weights Standardization (PWS) as a fast, effective alternative that speeds up training without output normalization.
Contribution
The paper introduces PWS, a novel method that addresses gradient shift issues, improves training speed, and offers a new perspective on BN's effectiveness through variance transmission analysis.
Findings
PWS speeds up network convergence without output normalization.
Gradient shift amplifies variance in convolutional layers.
PWS is computationally efficient and robust to mini-batch size.
Abstract
Normalization operations are essential for state-of-the-art neural networks and enable us to train a network from scratch with a large learning rate (LR). We attempt to explain the real effect of Batch Normalization (BN) from the perspective of variance transmission by investigating the relationship between BN and Weights Normalization (WN). In this work, we demonstrate that the problem of the shift of the average gradient will amplify the variance of every convolutional (conv) layer. We propose Parametric Weights Standardization (PWS), a fast and robust to mini-batch size module used for conv filters, to solve the shift of the average gradient. PWS can provide the speed-up of BN. Besides, it has less computation and does not change the output of a conv layer. PWS enables the network to converge fast without normalizing the outputs. This result enhances the persuasiveness of the shift…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsHuman Pose and Action Recognition · Advanced Neural Network Applications · Neural Networks and Applications
MethodsBatch Normalization
