Train longer, generalize better: closing the generalization gap in large   batch training of neural networks

Elad Hoffer; Itay Hubara; Daniel Soudry

arXiv:1705.08741·stat.ML·January 3, 2018·419 cites

Train longer, generalize better: closing the generalization gap in large batch training of neural networks

Elad Hoffer, Itay Hubara, Daniel Soudry

PDF

Open Access 1 Repo

TL;DR

This paper investigates the causes of the generalization gap in large-batch neural network training, proposing a new training regime and a novel normalization technique to improve generalization without increasing updates.

Contribution

It introduces a statistical model explaining the gap, demonstrates that the gap is due to the number of updates rather than batch size, and proposes Ghost Batch Normalization to mitigate the issue.

Findings

01

The generalization gap is primarily due to insufficient training updates.

02

Adapting training regimes can eliminate the gap.

03

Ghost Batch Normalization significantly reduces the gap.

Abstract

Background: Deep learning models are typically trained using stochastic gradient descent or one of its variants. These methods update the weights using their gradient, estimated from a small fraction of the training data. It has been observed that when using large batch sizes there is a persistent degradation in generalization performance - known as the "generalization gap" phenomena. Identifying the origin of this gap and closing it had remained an open problem. Contributions: We examine the initial high learning rate training phase. We find that the weight distance from its initialization grows logarithmically with the number of weight updates. We therefore propose a "random walk on random landscape" statistical model which is known to exhibit similar "ultra-slow" diffusion behavior. Following this hypothesis we conducted experiments to show empirically that the "generalization gap"…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

eladhoffer/bigBatch
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Gaussian Processes and Bayesian Inference · Generative Adversarial Networks and Image Synthesis