Separating the Effects of Batch Normalization on CNN Training Speed and   Stability Using Classical Adaptive Filter Theory

Elaina Chai; Mert Pilanci; Boris Murmann

arXiv:2002.10674·cs.NE·June 2, 2021·1 cites

Separating the Effects of Batch Normalization on CNN Training Speed and Stability Using Classical Adaptive Filter Theory

Elaina Chai, Mert Pilanci, Boris Murmann

PDF

Open Access

TL;DR

This paper analyzes how Batch Normalization affects CNN training speed and stability by applying classical adaptive filter theory, revealing distinct effects at different learning rates and linking it to eigenvalue dynamics.

Contribution

It introduces a novel perspective by connecting BatchNorm's effects to adaptive filter eigenvalue behavior and demonstrates its dual role in enhancing speed and stability.

Findings

01

BatchNorm influences eigenvalues of input autocorrelation matrices.

02

At low learning rates, BatchNorm accelerates convergence by amplifying small eigenvalues.

03

At high learning rates, BatchNorm stabilizes training by suppressing large eigenvalues.

Abstract

Batch Normalization (BatchNorm) is commonly used in Convolutional Neural Networks (CNNs) to improve training speed and stability. However, there is still limited consensus on why this technique is effective. This paper uses concepts from the traditional adaptive filter domain to provide insight into the dynamics and inner workings of BatchNorm. First, we show that the convolution weight updates have natural modes whose stability and convergence speed are tied to the eigenvalues of the input autocorrelation matrices, which are controlled by BatchNorm through the convolution layers' channel-wise structure. Furthermore, our experiments demonstrate that the speed and stability benefits are distinct effects. At low learning rates, it is BatchNorm's amplification of the smallest eigenvalues that improves convergence speed, while at high learning rates, it is BatchNorm's suppression of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Neural Networks and Applications

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Convolution