Towards Stabilizing Batch Statistics in Backward Propagation of Batch Normalization
Junjie Yan, Ruosi Wan, Xiangyu Zhang, Wei Zhang, Yichen Wei, Jian Sun

TL;DR
This paper identifies overlooked batch statistics in BN's backward pass that impair training with small batches and introduces MABN, a method restoring BN's performance without extra inference complexity, validated on vision tasks.
Contribution
The paper reveals two previously unconsidered batch statistics in BN's backward propagation and proposes MABN, a novel normalization method that maintains BN's performance with small batch sizes.
Findings
MABN restores BN performance in small batch scenarios.
Theoretical analysis supports MABN's effectiveness.
Experiments on ImageNet and COCO validate improvements.
Abstract
Batch Normalization (BN) is one of the most widely used techniques in Deep Learning field. But its performance can awfully degrade with insufficient batch size. This weakness limits the usage of BN on many computer vision tasks like detection or segmentation, where batch size is usually small due to the constraint of memory consumption. Therefore many modified normalization techniques have been proposed, which either fail to restore the performance of BN completely, or have to introduce additional nonlinear operations in inference procedure and increase huge consumption. In this paper, we reveal that there are two extra batch statistics involved in backward propagation of BN, on which has never been well discussed before. The extra batch statistics associated with gradients also can severely affect the training of deep neural network. Based on our analysis, we propose a novel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBlind Source Separation Techniques · Algorithms and Data Compression · Advanced Data Compression Techniques
MethodsBatch Normalization
