Big Batch SGD: Automated Inference using Adaptive Batch Sizes

Soham De; Abhay Yadav; David Jacobs; Tom Goldstein

arXiv:1610.05792·cs.LG·April 10, 2017·39 cites

Big Batch SGD: Automated Inference using Adaptive Batch Sizes

Soham De, Abhay Yadav, David Jacobs, Tom Goldstein

PDF

Open Access

TL;DR

This paper introduces adaptive big batch SGD algorithms that grow batch sizes over time to maintain gradient signal quality, enabling automated learning rate tuning and eliminating the need for stepsize decay.

Contribution

It presents a novel big batch SGD scheme that adaptively increases batch size, achieving similar convergence rates to classical SGD without requiring convexity or stepsize decay.

Findings

01

Maintains a nearly constant signal-to-noise ratio in gradients.

02

Enables automated learning rate selection.

03

Achieves convergence rates comparable to classical SGD.

Abstract

Classical stochastic gradient methods for optimization rely on noisy gradient approximations that become progressively less accurate as iterates approach a solution. The large noise and small signal in the resulting gradients makes it difficult to use them for adaptive stepsize selection and automatic stopping. We propose alternative "big batch" SGD schemes that adaptively grow the batch size over time to maintain a nearly constant signal-to-noise ratio in the gradient approximation. The resulting methods have similar convergence rates to classical SGD, and do not require convexity of the objective. The high fidelity gradients enable automated learning rate selection and do not require stepsize decay. Big batch methods are thus easily automated and can run with little or no oversight.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning · Machine Learning and Algorithms

MethodsStochastic Gradient Descent