Batch Normalization Decomposed
Ido Nachum, Marco Bondaschi, Michael Gastpar, Anatoly Khina

TL;DR
This paper analyzes the effects of recentering and non-linearity in batch normalization, revealing a clustering behavior at initialization and providing geometric and stability insights into this phenomenon.
Contribution
It extends previous linear network analysis by examining the recentering and non-linearity components of batch normalization, uncovering their impact on network representations.
Findings
Representations converge to a single cluster with an outlier at initialization.
The clustering behavior is stable under certain conditions.
Geometric analysis explains the evolution of representations.
Abstract
\emph{Batch normalization} is a successful building block of neural network architectures. Yet, it is not well understood. A neural network layer with batch normalization comprises three components that affect the representation induced by the network: \emph{recentering} the mean of the representation to zero, \emph{rescaling} the variance of the representation to one, and finally applying a \emph{non-linearity}. Our work follows the work of Hadi Daneshmand, Amir Joudaki, Francis Bach [NeurIPS~'21], which studied deep \emph{linear} neural networks with only the rescaling stage between layers at initialization. In our work, we present an analysis of the other two key components of networks with batch normalization, namely, the recentering and the non-linearity. When these two components are present, we observe a curious behavior at initialization. Through the layers, the representation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFault Detection and Control Systems · Advanced Control Systems Optimization
MethodsBatch Normalization
