Beating SGD Saturation with Tail-Averaging and Minibatching

Nicole M\"ucke; Gergely Neu; Lorenzo Rosasco

arXiv:1902.08668·stat.ML·May 28, 2019·5 cites

Beating SGD Saturation with Tail-Averaging and Minibatching

Nicole M\"ucke, Gergely Neu, Lorenzo Rosasco

PDF

Open Access

TL;DR

This paper analyzes how tail-averaging and minibatching in SGD improve convergence rates and learning errors in nonparametric least squares, providing practical insights and novel theoretical results.

Contribution

It demonstrates that tail averaging outperforms uniform averaging in nonparametric settings and shows how combining tail-averaging with minibatching enables more aggressive step-size choices.

Findings

01

Tail averaging achieves faster convergence than uniform averaging.

02

Combining tail-averaging with minibatching allows larger step sizes.

03

Practical guidelines for optimizing SGD variants in nonparametric learning.

Abstract

While stochastic gradient descent (SGD) is one of the major workhorses in machine learning, the learning properties of many practically used variants are poorly understood. In this paper, we consider least squares learning in a nonparametric setting and contribute to filling this gap by focusing on the effect and interplay of multiple passes, mini-batching and averaging, and in particular tail averaging. Our results show how these different variants of SGD can be combined to achieve optimal learning errors, hence providing practical insights. In particular, we show for the first time in the literature that tail averaging allows faster convergence rates than uniform averaging in the nonparametric setting. Finally, we show that a combination of tail-averaging and minibatching allows more aggressive step-size choices than using any one of said components.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and Algorithms · Advanced Bandit Algorithms Research

MethodsStochastic Gradient Descent