Batch Kalman Normalization: Towards Training Deep Neural Networks with Micro-Batches
Guangrun Wang, Jiefeng Peng, Ping Luo, Xinjiang Wang and, Liang Lin

TL;DR
This paper introduces Batch Kalman Normalization (BKN), a new method that improves deep neural network training with micro-batches by considering all layers collectively, leading to more stable training and better accuracy with smaller batch sizes.
Contribution
BKN is a novel normalization technique that models all network layers as a system, enhancing stability and performance especially with micro-batches, outperforming traditional Batch Normalization.
Findings
BKN achieves higher accuracy on ImageNet with smaller batch sizes.
Training with BKN converges faster than with BN.
BKN performs well with batch sizes 64 times smaller on CIFAR and 8 times smaller on ImageNet.
Abstract
As an indispensable component, Batch Normalization (BN) has successfully improved the training of deep neural networks (DNNs) with mini-batches, by normalizing the distribution of the internal representation for each hidden layer. However, the effectiveness of BN would diminish with scenario of micro-batch (e.g., less than 10 samples in a mini-batch), since the estimated statistics in a mini-batch are not reliable with insufficient samples. In this paper, we present a novel normalization method, called Batch Kalman Normalization (BKN), for improving and accelerating the training of DNNs, particularly under the context of micro-batches. Specifically, unlike the existing solutions treating each hidden layer as an isolated system, BKN treats all the layers in a network as a whole system, and estimates the statistics of a certain layer by considering the distributions of all its preceding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Anomaly Detection Techniques and Applications
MethodsBatch Normalization
