Stochastic Gradient Descent, Weighted Sampling, and the Randomized   Kaczmarz algorithm

Deanna Needell; Nathan Srebro; Rachel Ward

arXiv:1310.5715·math.NA·January 19, 2015·34 cites

Stochastic Gradient Descent, Weighted Sampling, and the Randomized Kaczmarz algorithm

Deanna Needell, Nathan Srebro, Rachel Ward

PDF

Open Access

TL;DR

This paper improves the theoretical understanding of stochastic gradient descent (SGD) convergence rates for smooth, strongly convex functions, highlighting the importance of importance sampling and connecting SGD with the randomized Kaczmarz algorithm.

Contribution

It provides a tighter finite-sample convergence guarantee for SGD, introduces importance sampling for further improvements, and establishes a novel link between SGD and the randomized Kaczmarz algorithm.

Findings

01

Linear convergence rate with dependence on L/μ

02

Importance sampling enhances convergence

03

Connection between SGD and Kaczmarz algorithm

Abstract

We obtain an improved finite-sample guarantee on the linear convergence of stochastic gradient descent for smooth and strongly convex objectives, improving from a quadratic dependence on the conditioning $(L / μ)^{2}$ (where $L$ is a bound on the smoothness and $μ$ on the strong convexity) to a linear dependence on $L / μ$ . Furthermore, we show how reweighting the sampling distribution (i.e. importance sampling) is necessary in order to further improve convergence, and obtain a linear dependence in the average smoothness, dominating previous results. We also discuss importance sampling for SGD more broadly and show how it can improve convergence also in other scenarios. Our results are based on a connection we make between SGD and the randomized Kaczmarz algorithm, which allows us to transfer ideas between the separate bodies of literature studying each of the two methods. In…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Sparse and Compressive Sensing Techniques

MethodsStochastic Gradient Descent