Augment your batch: better training with larger batches

Elad Hoffer; Tal Ben-Nun; Itay Hubara; Niv Giladi; Torsten Hoefler,; Daniel Soudry

arXiv:1901.09335·cs.LG·January 29, 2019·50 cites

Augment your batch: better training with larger batches

Elad Hoffer, Tal Ben-Nun, Itay Hubara, Niv Giladi, Torsten Hoefler,, Daniel Soudry

PDF

Open Access 1 Repo

TL;DR

This paper introduces batch augmentation, a technique that replicates samples with different augmentations within a batch, acting as a regularizer and accelerator to improve training speed and model generalization in deep neural networks.

Contribution

The paper proposes batch augmentation as a novel method to enhance large-batch SGD training, improving convergence and generalization without extensive hyperparameter tuning.

Findings

01

Batch augmentation reduces the number of SGD updates needed for target accuracy.

02

It empirically improves convergence across various neural network architectures.

03

The method enhances generalization and training speed in large-batch settings.

Abstract

Large-batch SGD is important for scaling training of deep neural networks. However, without fine-tuning hyperparameter schedules, the generalization of the model may be hampered. We propose to use batch augmentation: replicating instances of samples within the same batch with different data augmentations. Batch augmentation acts as a regularizer and an accelerator, increasing both generalization and performance scaling. We analyze the effect of batch augmentation on gradient variance and show that it empirically improves convergence for a wide variety of deep neural networks and datasets. Our results show that batch augmentation reduces the number of necessary SGD updates to achieve the same accuracy as the state-of-the-art. Overall, this simple yet effective method enables faster training and better generalization by allowing more computational resources to be used concurrently.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vaapopescu/gradient-pruning
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms

MethodsStochastic Gradient Descent