Variance-Reduced Methods for Machine Learning

Robert M. Gower; Mark Schmidt; Francis Bach; Peter Richtarik

arXiv:2010.00892·cs.LG·October 5, 2020

Variance-Reduced Methods for Machine Learning

Robert M. Gower, Mark Schmidt, Francis Bach, Peter Richtarik

PDF

TL;DR

This paper reviews variance reduction techniques in stochastic optimization, highlighting their theoretical and practical advantages over traditional SGD, especially in convex settings with multiple data passes.

Contribution

It provides a comprehensive overview of key principles and developments in variance reduction methods for finite data sets, aimed at non-experts.

Findings

01

VR methods achieve faster convergence than SGD

02

VR techniques are effective in convex optimization

03

Growing interest and research in VR methods

Abstract

Stochastic optimization lies at the heart of machine learning, and its cornerstone is stochastic gradient descent (SGD), a method introduced over 60 years ago. The last 8 years have seen an exciting new development: variance reduction (VR) for stochastic optimization methods. These VR methods excel in settings where more than one pass through the training data is allowed, achieving a faster convergence than SGD in theory as well as practice. These speedups underline the surge of interest in VR methods and the fast-growing body of work on this topic. This review covers the key principles and main developments behind VR methods for optimization with finite data sets and is aimed at non-expert readers. We focus mainly on the convex setting, and leave pointers to readers interested in extensions for minimizing non-convex functions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsStochastic Gradient Descent