Batch Renormalization: Towards Reducing Minibatch Dependence in   Batch-Normalized Models

Sergey Ioffe

arXiv:1702.03275·cs.LG·March 31, 2017·245 cites

Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models

Sergey Ioffe

PDF

Open Access 5 Repos

TL;DR

This paper introduces Batch Renormalization, a method that reduces minibatch dependence in batch-normalized models, improving training stability and performance with small or non-i.i.d. minibatches while maintaining batch normalization benefits.

Contribution

It proposes Batch Renormalization, an extension to Batch Normalization that aligns training and inference outputs, especially effective for small or non-i.i.d. minibatches.

Findings

01

Improves model performance with small or non-i.i.d. minibatches.

02

Retains benefits of batch normalization such as training efficiency.

03

Models trained with Batch Renormalization outperform standard batchnorm in challenging settings.

Abstract

Batch Normalization is quite effective at accelerating and improving the training of deep models. However, its effectiveness diminishes when the training minibatches are small, or do not consist of independent samples. We hypothesize that this is due to the dependence of model layer inputs on all the examples in the minibatch, and different activations being produced between training and inference. We propose Batch Renormalization, a simple and effective extension to ensure that the training and inference models generate the same outputs that depend on individual examples rather than the entire minibatch. Models trained with Batch Renormalization perform substantially better than batchnorm when training with small or non-i.i.d. minibatches. At the same time, Batch Renormalization retains the benefits of batchnorm such as insensitivity to initialization and training efficiency.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Advanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis