AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks
Aditya Devarakonda, Maxim Naumov, Michael Garland

TL;DR
This paper introduces AdaBatch, an adaptive batch size method for training deep neural networks that combines the convergence benefits of small batches with the efficiency of large batches, improving training performance.
Contribution
AdaBatch adaptively increases batch size during training, achieving faster convergence and better computational efficiency compared to fixed batch size methods.
Findings
Improves training speed by up to 6.25 times on GPUs.
Maintains less than 1% accuracy difference from fixed batch size training.
Effective across multiple architectures and datasets.
Abstract
Training deep neural networks with Stochastic Gradient Descent, or its variants, requires careful choice of both learning rate and batch size. While smaller batch sizes generally converge in fewer training epochs, larger batch sizes offer more parallelism and hence better computational efficiency. We have developed a new training approach that, rather than statically choosing a single batch size for all epochs, adaptively increases the batch size during the training process. Our method delivers the convergence rate of small batch sizes while achieving performance similar to large batch sizes. We analyse our approach using the standard AlexNet, ResNet, and VGG networks operating on the popular CIFAR-10, CIFAR-100, and ImageNet datasets. Our results demonstrate that learning with adaptive batch sizes can improve performance by factors of up to 6.25 on 4 NVIDIA Tesla P100 GPUs while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning
MethodsAverage Pooling · Global Average Pooling · 1x1 Convolution · Batch Normalization · Bottleneck Residual Block · Kaiming Initialization · Residual Connection · Residual Block · Local Response Normalization · Bitcoin Customer Service Number +1-833-534-1729
