Katyusha: The First Direct Acceleration of Stochastic Gradient Methods

Zeyuan Allen-Zhu

arXiv:1603.05953·math.OC·September 25, 2018

Katyusha: The First Direct Acceleration of Stochastic Gradient Methods

Zeyuan Allen-Zhu

PDF

TL;DR

Katyusha is a novel stochastic gradient method that achieves optimal accelerated convergence rates and parallel speedup by introducing a new momentum technique, overcoming limitations of Nesterov's momentum in stochastic settings.

Contribution

The paper introduces Katyusha, a primal-only stochastic gradient method with a novel negative momentum, providing optimal acceleration and parallel speedup in convex finite-sum optimization.

Findings

01

Achieves optimal accelerated convergence rate

02

Enjoys optimal parallel linear speedup

03

Incorporates a novel negative momentum technique

Abstract

Nesterov's momentum trick is famously known for accelerating gradient descent, and has been proven useful in building fast iterative algorithms. However, in the stochastic setting, counterexamples exist and prevent Nesterov's momentum from providing similar acceleration, even if the underlying problem is convex and finite-sum. We introduce $Katyusha$ , a direct, primal-only stochastic gradient method to fix this issue. In convex finite-sum stochastic optimization, $Katyusha$ has an optimal accelerated convergence rate, and enjoys an optimal parallel linear speedup in the mini-batch setting. The main ingredient is $Katyusha momentum$ , a novel "negative momentum" on top of Nesterov's momentum. It can be incorporated into a variance-reduction based algorithm and speed it up, both in terms of $sequential and parallel$ performance. Since variance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings