Batch normalization does not improve initialization

Joris Dannemann; Gero Junike

arXiv:2502.17913·cs.LG·February 26, 2025

Batch normalization does not improve initialization

Joris Dannemann, Gero Junike

PDF

Open Access

TL;DR

This paper challenges the common belief that batch normalization enhances neural network initialization, providing a counterexample that demonstrates it does not improve initialization as previously claimed.

Contribution

The paper presents a counterexample to disprove the claim that batch normalization improves neural network initialization.

Findings

01

Counterexample shows batch normalization does not improve initialization

02

Challenges prior theoretical claims about batch normalization's role in initialization

03

Highlights the need to reconsider the theoretical understanding of batch normalization

Abstract

Batch normalization is one of the most important regularization techniques for neural networks, significantly improving training by centering the layers of the neural network. There have been several attempts to provide a theoretical justification for batch ormalization. Santurkar and Tsipras (2018) [How does batch normalization help optimization? Advances in neural information rocessing systems, 31] claim that batch normalization improves initialization. We provide a counterexample showing that this claim s not true, i.e., batch normalization does not improve initialization.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFault Detection and Control Systems · VLSI and Analog Circuit Testing · Optimal Experimental Design Methods

MethodsBatch Normalization