Stochastic Gradient Descent, Weighted Sampling, and the Randomized Kaczmarz algorithm
Deanna Needell, Nathan Srebro, Rachel Ward

TL;DR
This paper improves the theoretical understanding of stochastic gradient descent (SGD) convergence rates for smooth, strongly convex functions, highlighting the importance of importance sampling and connecting SGD with the randomized Kaczmarz algorithm.
Contribution
It provides a tighter finite-sample convergence guarantee for SGD, introduces importance sampling for further improvements, and establishes a novel link between SGD and the randomized Kaczmarz algorithm.
Findings
Linear convergence rate with dependence on L/μ
Importance sampling enhances convergence
Connection between SGD and Kaczmarz algorithm
Abstract
We obtain an improved finite-sample guarantee on the linear convergence of stochastic gradient descent for smooth and strongly convex objectives, improving from a quadratic dependence on the conditioning (where is a bound on the smoothness and on the strong convexity) to a linear dependence on . Furthermore, we show how reweighting the sampling distribution (i.e. importance sampling) is necessary in order to further improve convergence, and obtain a linear dependence in the average smoothness, dominating previous results. We also discuss importance sampling for SGD more broadly and show how it can improve convergence also in other scenarios. Our results are based on a connection we make between SGD and the randomized Kaczmarz algorithm, which allows us to transfer ideas between the separate bodies of literature studying each of the two methods. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Sparse and Compressive Sensing Techniques
MethodsStochastic Gradient Descent
