Training Deep Neural Networks Without Batch Normalization
Divya Gaur, Joachim Folz, and Andreas Dengel

TL;DR
This paper investigates the possibility of training deep neural networks effectively without batch normalization by comparing alternative methods and understanding the underlying effects of normalization techniques.
Contribution
It provides a detailed analysis of batch normalization and explores training strategies that enable effective neural network training without it.
Findings
Batch normalization can be replaced with alternative techniques like weight normalization and dropout.
Training without batch normalization is feasible with proper adjustments to the training process.
Theoretical insights into the role of normalization in neural network optimization are discussed.
Abstract
Training neural networks is an optimization problem, and finding a decent set of parameters through gradient descent can be a difficult task. A host of techniques has been developed to aid this process before and during the training phase. One of the most important and widely used class of method is normalization. It is generally favorable for neurons to receive inputs that are distributed with zero mean and unit variance, so we use statistics about dataset to normalize them before the first layer. However, this property cannot be guaranteed for the intermediate activations inside the network. A widely used method to enforce this property inside the network is batch normalization. It was developed to combat covariate shift inside networks. Empirically it is known to work, but there is a lack of theoretical understanding about its effectiveness and potential drawbacks it might have when…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Generative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications
MethodsGradient Clipping · Batch Normalization
