Revisiting Batch Norm Initialization

Jim Davis; Logan Frank

arXiv:2110.13989·cs.CV·July 18, 2022·1 cites

Revisiting Batch Norm Initialization

Jim Davis, Logan Frank

PDF

Open Access 2 Repos

TL;DR

This paper proposes a new initialization method for batch normalization that improves training stability and performance of deep neural networks by addressing issues with the standard initialization.

Contribution

It introduces a novel BN initialization technique that enhances training effectiveness without extra computational cost, supported by rigorous experiments.

Findings

01

Proper BN scale initialization improves model performance

02

The new method stabilizes training by preventing overly large normalization values

03

Experiments show statistically significant gains over standard initialization

Abstract

Batch normalization (BN) is comprised of a normalization component followed by an affine transformation and has become essential for training deep neural networks. Standard initialization of each BN in a network sets the affine transformation scale and shift to 1 and 0, respectively. However, after training we have observed that these parameters do not alter much from their initialization. Furthermore, we have noticed that the normalization process can still yield overly large values, which is undesirable for training. We revisit the BN formulation and present a new initialization method and update approach for BN to address the aforementioned issues. Experiments are designed to emphasize and demonstrate the positive influence of proper BN scale initialization on performance, and use rigorous statistical significance tests for evaluation. The approach can be used with existing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Machine Learning and Data Classification · Speech Recognition and Synthesis