Theoretical Insight into Batch Normalization: Data Dependant Auto-Tuning   of Regularization Rate

Lakshmi Annamalai; Chetan Singh Thakur

arXiv:2209.07587·stat.ML·October 19, 2022

Theoretical Insight into Batch Normalization: Data Dependant Auto-Tuning of Regularization Rate

Lakshmi Annamalai, Chetan Singh Thakur

PDF

Open Access

TL;DR

This paper provides a theoretical explanation for how batch normalization automatically adjusts the regularization parameter based on data statistics, supported by analytical proofs and empirical validation.

Contribution

It introduces a data-dependent auto-tuning mechanism for the regularization parameter in batch normalization, with analytical proofs and empirical evidence.

Findings

01

BN auto-tunes regularization based on data statistics

02

Analytical proof of BN behavior under noisy inputs

03

Empirical validation on MNIST dataset

Abstract

Batch normalization is widely used in deep learning to normalize intermediate activations. Deep networks suffer from notoriously increased training complexity, mandating careful initialization of weights, requiring lower learning rates, etc. These issues have been addressed by Batch Normalization (\textbf{BN}), by normalizing the inputs of activations to zero mean and unit standard deviation. Making this batch normalization part of the training process dramatically accelerates the training process of very deep networks. A new field of research has been going on to examine the exact theoretical explanation behind the success of \textbf{BN}. Most of these theoretical insights attempt to explain the benefits of \textbf{BN} by placing them on its influence on optimization, weight scale invariance, and regularization. Despite \textbf{BN} undeniable success in accelerating generalization, the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Advanced Neural Network Applications · Stochastic Gradient Optimization Techniques

MethodsBatch Normalization