Batch Normalization Decomposed

Ido Nachum; Marco Bondaschi; Michael Gastpar; Anatoly Khina

arXiv:2412.02843·cs.LG·December 5, 2024

Batch Normalization Decomposed

Ido Nachum, Marco Bondaschi, Michael Gastpar, Anatoly Khina

PDF

Open Access

TL;DR

This paper analyzes the effects of recentering and non-linearity in batch normalization, revealing a clustering behavior at initialization and providing geometric and stability insights into this phenomenon.

Contribution

It extends previous linear network analysis by examining the recentering and non-linearity components of batch normalization, uncovering their impact on network representations.

Findings

01

Representations converge to a single cluster with an outlier at initialization.

02

The clustering behavior is stable under certain conditions.

03

Geometric analysis explains the evolution of representations.

Abstract

\emph{Batch normalization} is a successful building block of neural network architectures. Yet, it is not well understood. A neural network layer with batch normalization comprises three components that affect the representation induced by the network: \emph{recentering} the mean of the representation to zero, \emph{rescaling} the variance of the representation to one, and finally applying a \emph{non-linearity}. Our work follows the work of Hadi Daneshmand, Amir Joudaki, Francis Bach [NeurIPS~'21], which studied deep \emph{linear} neural networks with only the rescaling stage between layers at initialization. In our work, we present an analysis of the other two key components of networks with batch normalization, namely, the recentering and the non-linearity. When these two components are present, we observe a curious behavior at initialization. Through the layers, the representation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFault Detection and Control Systems · Advanced Control Systems Optimization

MethodsBatch Normalization