AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks

Aditya Devarakonda; Maxim Naumov; Michael Garland

arXiv:1712.02029·cs.LG·February 15, 2018·105 cites

AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks

Aditya Devarakonda, Maxim Naumov, Michael Garland

PDF

Open Access 1 Repo

TL;DR

This paper introduces AdaBatch, an adaptive batch size method for training deep neural networks that combines the convergence benefits of small batches with the efficiency of large batches, improving training performance.

Contribution

AdaBatch adaptively increases batch size during training, achieving faster convergence and better computational efficiency compared to fixed batch size methods.

Findings

01

Improves training speed by up to 6.25 times on GPUs.

02

Maintains less than 1% accuracy difference from fixed batch size training.

03

Effective across multiple architectures and datasets.

Abstract

Training deep neural networks with Stochastic Gradient Descent, or its variants, requires careful choice of both learning rate and batch size. While smaller batch sizes generally converge in fewer training epochs, larger batch sizes offer more parallelism and hence better computational efficiency. We have developed a new training approach that, rather than statically choosing a single batch size for all epochs, adaptively increases the batch size during the training process. Our method delivers the convergence rate of small batch sizes while achieving performance similar to large batch sizes. We analyse our approach using the standard AlexNet, ResNet, and VGG networks operating on the popular CIFAR-10, CIFAR-100, and ImageNet datasets. Our results demonstrate that learning with adaptive batch sizes can improve performance by factors of up to 6.25 on 4 NVIDIA Tesla P100 GPUs while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

GXU-GMU-MICCAI/AdaBatch-numerical-experiments
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning

MethodsAverage Pooling · Global Average Pooling · 1x1 Convolution · Batch Normalization · Bottleneck Residual Block · Kaiming Initialization · Residual Connection · Residual Block · Local Response Normalization · Bitcoin Customer Service Number +1-833-534-1729