On Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants
Sashank J. Reddi, Ahmed Hefny, Suvrit Sra, Barnab\'as P\'oczos, Alex, Smola

TL;DR
This paper develops a unifying framework for variance reduction techniques in stochastic gradient descent, introduces an asynchronous algorithm with proven fast convergence, and demonstrates near-linear speedup in large-scale machine learning tasks.
Contribution
It provides the first unified framework for asynchronous variance reduction algorithms and introduces an asynchronous SVRG with theoretical convergence guarantees.
Findings
Achieves near-linear speedup in sparse settings.
Proves fast convergence of the proposed asynchronous algorithm.
Demonstrates empirical effectiveness of asynchronous SVRG.
Abstract
We study optimization algorithms based on variance reduction for stochastic gradient descent (SGD). Remarkable recent progress has been made in this direction through development of algorithms like SAG, SVRG, SAGA. These algorithms have been shown to outperform SGD, both theoretically and empirically. However, asynchronous versions of these algorithms---a crucial requirement for modern large-scale applications---have not been studied. We bridge this gap by presenting a unifying framework for many variance reduction techniques. Subsequently, we propose an asynchronous algorithm grounded in our framework, and prove its fast convergence. An important consequence of our general approach is that it yields asynchronous versions of variance reduction algorithms such as SVRG and SAGA as a byproduct. Our method achieves near linear speedup in sparse settings common to machine learning. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Sparse and Compressive Sensing Techniques
MethodsSAGA · Stochastic Gradient Descent
