Theoretical Insight into Batch Normalization: Data Dependant Auto-Tuning of Regularization Rate
Lakshmi Annamalai, Chetan Singh Thakur

TL;DR
This paper provides a theoretical explanation for how batch normalization automatically adjusts the regularization parameter based on data statistics, supported by analytical proofs and empirical validation.
Contribution
It introduces a data-dependent auto-tuning mechanism for the regularization parameter in batch normalization, with analytical proofs and empirical evidence.
Findings
BN auto-tunes regularization based on data statistics
Analytical proof of BN behavior under noisy inputs
Empirical validation on MNIST dataset
Abstract
Batch normalization is widely used in deep learning to normalize intermediate activations. Deep networks suffer from notoriously increased training complexity, mandating careful initialization of weights, requiring lower learning rates, etc. These issues have been addressed by Batch Normalization (\textbf{BN}), by normalizing the inputs of activations to zero mean and unit standard deviation. Making this batch normalization part of the training process dramatically accelerates the training process of very deep networks. A new field of research has been going on to examine the exact theoretical explanation behind the success of \textbf{BN}. Most of these theoretical insights attempt to explain the benefits of \textbf{BN} by placing them on its influence on optimization, weight scale invariance, and regularization. Despite \textbf{BN} undeniable success in accelerating generalization, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Advanced Neural Network Applications · Stochastic Gradient Optimization Techniques
MethodsBatch Normalization
