Batchless Normalization: How to Normalize Activations Across Instances   with Minimal Memory Requirements

Benjamin Berger (Leibniz Universit\"at Hannover); Victor Uc Cetina; (Universidad Aut\'onoma de Yucat\'an)

arXiv:2212.14729·cs.LG·July 26, 2024

Batchless Normalization: How to Normalize Activations Across Instances with Minimal Memory Requirements

Benjamin Berger (Leibniz Universit\"at Hannover), Victor Uc Cetina, (Universidad Aut\'onoma de Yucat\'an)

PDF

Open Access 1 Repo

TL;DR

This paper introduces a memory-efficient normalization method for neural network activations that eliminates the need for batch statistics, simplifying implementation and reducing hardware requirements.

Contribution

It proposes a novel loss-based normalization approach that removes the need for batch-dependent statistics, addressing memory and complexity issues of batch normalization.

Findings

01

Reduces memory consumption during training.

02

Simplifies normalization implementation.

03

Potentially democratizes AI research by lowering hardware barriers.

Abstract

In training neural networks, batch normalization has many benefits, not all of them entirely understood. But it also has some drawbacks. Foremost is arguably memory consumption, as computing the batch statistics requires all instances within the batch to be processed simultaneously, whereas without batch normalization it would be possible to process them one by one while accumulating the weight gradients. Another drawback is that that distribution parameters (mean and standard deviation) are unlike all other model parameters in that they are not trained using gradient descent but require special treatment, complicating implementation. In this paper, I show a simple and straightforward way to address these issues. The idea, in short, is to add terms to the loss that, for each activation, cause the minimization of the negative log likelihood of a Gaussian distribution that is used to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ichteltelch/Batchless
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Gaussian Processes and Bayesian Inference · Domain Adaptation and Few-Shot Learning

MethodsBatch Normalization