Revisiting Small Batch Training for Deep Neural Networks

Dominic Masters; Carlo Luschi

arXiv:1804.07612·cs.LG·April 23, 2018·353 cites

Revisiting Small Batch Training for Deep Neural Networks

Dominic Masters, Carlo Luschi

PDF

Open Access 3 Repos

TL;DR

This paper investigates the effects of mini-batch size on deep neural network training, revealing that smaller batches often yield better stability and performance, challenging the trend of using very large mini-batches.

Contribution

The study provides an experimental comparison of different mini-batch sizes under a consistent learning rate scheme, highlighting the advantages of small batches for stability and generalization.

Findings

01

Small mini-batches (2-32) outperform large ones in stability and test performance.

02

Increasing mini-batch size narrows the range of stable learning rates.

03

Large mini-batches in the thousands are less effective for training stability.

Abstract

Modern deep neural network training is typically based on mini-batch stochastic gradient optimization. While the use of large mini-batches increases the available computational parallelism, small batch training has been shown to provide improved generalization performance and allows a significantly smaller memory footprint, which might also be exploited to improve machine throughput. In this paper, we review common assumptions on learning rate scaling and training duration, as a basis for an experimental comparison of test performance for different mini-batch sizes. We adopt a learning rate that corresponds to a constant average weight update per gradient calculation (i.e., per unit cost of computation), and point out that this results in a variance of the weight updates that increases linearly with the mini-batch size $m$ . The collected experimental results for the CIFAR-10,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques