Batchless Normalization: How to Normalize Activations Across Instances with Minimal Memory Requirements
Benjamin Berger (Leibniz Universit\"at Hannover), Victor Uc Cetina, (Universidad Aut\'onoma de Yucat\'an)

TL;DR
This paper introduces a memory-efficient normalization method for neural network activations that eliminates the need for batch statistics, simplifying implementation and reducing hardware requirements.
Contribution
It proposes a novel loss-based normalization approach that removes the need for batch-dependent statistics, addressing memory and complexity issues of batch normalization.
Findings
Reduces memory consumption during training.
Simplifies normalization implementation.
Potentially democratizes AI research by lowering hardware barriers.
Abstract
In training neural networks, batch normalization has many benefits, not all of them entirely understood. But it also has some drawbacks. Foremost is arguably memory consumption, as computing the batch statistics requires all instances within the batch to be processed simultaneously, whereas without batch normalization it would be possible to process them one by one while accumulating the weight gradients. Another drawback is that that distribution parameters (mean and standard deviation) are unlike all other model parameters in that they are not trained using gradient descent but require special treatment, complicating implementation. In this paper, I show a simple and straightforward way to address these issues. The idea, in short, is to add terms to the loss that, for each activation, cause the minimization of the negative log likelihood of a Gaussian distribution that is used to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Gaussian Processes and Bayesian Inference · Domain Adaptation and Few-Shot Learning
MethodsBatch Normalization
