Double Forward Propagation for Memorized Batch Normalization
Yong Guo, Qingyao Wu, Chaorui Deng, Jian Chen, Mingkui Tan

TL;DR
This paper introduces memorized batch normalization (MBN) with a double-forward scheme to improve stability and consistency of batch normalization in deep neural networks, especially with small batch sizes, enhancing generalization.
Contribution
The paper proposes a novel memorized batch normalization method with a double-forward scheme to address BN limitations and improve robustness and performance.
Findings
MBN reduces sensitivity to data variations.
Double-Forward scheme enhances training stability.
Models with MBN and Double-Forward outperform standard BN.
Abstract
Batch Normalization (BN) has been a standard component in designing deep neural networks (DNNs). Although the standard BN can significantly accelerate the training of DNNs and improve the generalization performance, it has several underlying limitations which may hamper the performance in both training and inference. In the training stage, BN relies on estimating the mean and variance of data using a single minibatch. Consequently, BN can be unstable when the batch size is very small or the data is poorly sampled. In the inference stage, BN often uses the so called moving mean and moving variance instead of batch statistics, i.e., the training and inference rules in BN are not consistent. Regarding these issues, we propose a memorized batch normalization (MBN), which considers multiple recent batches to obtain more accurate and robust statistics. Note that after the SGD update for each…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Machine Learning and Data Classification
MethodsBatch Normalization · Stochastic Gradient Descent
