Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models
Sergey Ioffe

TL;DR
This paper introduces Batch Renormalization, a method that reduces minibatch dependence in batch-normalized models, improving training stability and performance with small or non-i.i.d. minibatches while maintaining batch normalization benefits.
Contribution
It proposes Batch Renormalization, an extension to Batch Normalization that aligns training and inference outputs, especially effective for small or non-i.i.d. minibatches.
Findings
Improves model performance with small or non-i.i.d. minibatches.
Retains benefits of batch normalization such as training efficiency.
Models trained with Batch Renormalization outperform standard batchnorm in challenging settings.
Abstract
Batch Normalization is quite effective at accelerating and improving the training of deep models. However, its effectiveness diminishes when the training minibatches are small, or do not consist of independent samples. We hypothesize that this is due to the dependence of model layer inputs on all the examples in the minibatch, and different activations being produced between training and inference. We propose Batch Renormalization, a simple and effective extension to ensure that the training and inference models generate the same outputs that depend on individual examples rather than the entire minibatch. Models trained with Batch Renormalization perform substantially better than batchnorm when training with small or non-i.i.d. minibatches. At the same time, Batch Renormalization retains the benefits of batchnorm such as insensitivity to initialization and training efficiency.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Advanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis
