Parallelizing Stochastic Gradient Descent for Least Squares Regression:   mini-batching, averaging, and model misspecification

Prateek Jain; Sham M. Kakade; Rahul Kidambi; Praneeth Netrapalli,; Aaron Sidford

arXiv:1610.03774·stat.ML·August 1, 2018·89 cites

Parallelizing Stochastic Gradient Descent for Least Squares Regression: mini-batching, averaging, and model misspecification

Prateek Jain, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli,, Aaron Sidford

PDF

Open Access 4 Repos

TL;DR

This paper provides a detailed analysis of mini-batching and tail-averaging in stochastic gradient descent for least squares regression, demonstrating near-linear parallelization speedups and insights into noise effects on convergence.

Contribution

It offers non-asymptotic excess risk bounds for averaging schemes, characterizes parallelization speedups, and analyzes the impact of noise on stepsize choices in SGD.

Findings

01

Mini-batching reduces variance and enables near-linear parallel speedups.

02

Tail-averaging decreases variance in the final iterate of SGD.

03

The analysis shows how noise properties influence optimal stepsize choices.

Abstract

This work characterizes the benefits of averaging schemes widely used in conjunction with stochastic gradient descent (SGD). In particular, this work provides a sharp analysis of: (1) mini-batching, a method of averaging many samples of a stochastic gradient to both reduce the variance of the stochastic gradient estimate and for parallelizing SGD and (2) tail-averaging, a method involving averaging the final few iterates of SGD to decrease the variance in SGD's final iterate. This work presents non-asymptotic excess risk bounds for these schemes for the stochastic approximation problem of least squares regression. Furthermore, this work establishes a precise problem-dependent extent to which mini-batch SGD yields provable near-linear parallelization speedups over SGD with batch size one. This allows for understanding learning rate versus batch size tradeoffs for the final iterate of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Statistical Methods and Inference

MethodsStochastic Gradient Descent