Separating the Effects of Batch Normalization on CNN Training Speed and Stability Using Classical Adaptive Filter Theory
Elaina Chai, Mert Pilanci, Boris Murmann

TL;DR
This paper analyzes how Batch Normalization affects CNN training speed and stability by applying classical adaptive filter theory, revealing distinct effects at different learning rates and linking it to eigenvalue dynamics.
Contribution
It introduces a novel perspective by connecting BatchNorm's effects to adaptive filter eigenvalue behavior and demonstrates its dual role in enhancing speed and stability.
Findings
BatchNorm influences eigenvalues of input autocorrelation matrices.
At low learning rates, BatchNorm accelerates convergence by amplifying small eigenvalues.
At high learning rates, BatchNorm stabilizes training by suppressing large eigenvalues.
Abstract
Batch Normalization (BatchNorm) is commonly used in Convolutional Neural Networks (CNNs) to improve training speed and stability. However, there is still limited consensus on why this technique is effective. This paper uses concepts from the traditional adaptive filter domain to provide insight into the dynamics and inner workings of BatchNorm. First, we show that the convolution weight updates have natural modes whose stability and convergence speed are tied to the eigenvalues of the input autocorrelation matrices, which are controlled by BatchNorm through the convolution layers' channel-wise structure. Furthermore, our experiments demonstrate that the speed and stability benefits are distinct effects. At low learning rates, it is BatchNorm's amplification of the smallest eigenvalues that improves convergence speed, while at high learning rates, it is BatchNorm's suppression of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Neural Networks and Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Convolution
