Double Forward Propagation for Memorized Batch Normalization

Yong Guo; Qingyao Wu; Chaorui Deng; Jian Chen; Mingkui Tan

arXiv:2010.04947·cs.LG·October 13, 2020

Double Forward Propagation for Memorized Batch Normalization

Yong Guo, Qingyao Wu, Chaorui Deng, Jian Chen, Mingkui Tan

PDF

Open Access

TL;DR

This paper introduces memorized batch normalization (MBN) with a double-forward scheme to improve stability and consistency of batch normalization in deep neural networks, especially with small batch sizes, enhancing generalization.

Contribution

The paper proposes a novel memorized batch normalization method with a double-forward scheme to address BN limitations and improve robustness and performance.

Findings

01

MBN reduces sensitivity to data variations.

02

Double-Forward scheme enhances training stability.

03

Models with MBN and Double-Forward outperform standard BN.

Abstract

Batch Normalization (BN) has been a standard component in designing deep neural networks (DNNs). Although the standard BN can significantly accelerate the training of DNNs and improve the generalization performance, it has several underlying limitations which may hamper the performance in both training and inference. In the training stage, BN relies on estimating the mean and variance of data using a single minibatch. Consequently, BN can be unstable when the batch size is very small or the data is poorly sampled. In the inference stage, BN often uses the so called moving mean and moving variance instead of batch statistics, i.e., the training and inference rules in BN are not consistent. Regarding these issues, we propose a memorized batch normalization (MBN), which considers multiple recent batches to obtain more accurate and robust statistics. Note that after the SGD update for each…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Machine Learning and Data Classification

MethodsBatch Normalization · Stochastic Gradient Descent