STORM+: Fully Adaptive SGD with Momentum for Nonconvex Optimization
Kfir Y. Levy, Ali Kavis, Volkan Cevher

TL;DR
This paper introduces STORM+, an adaptive, parameter-free stochastic gradient method with momentum that achieves optimal convergence rates for nonconvex optimization without requiring prior knowledge of smoothness or large batch sizes.
Contribution
STORM+ extends the STORM algorithm by adaptively tuning parameters, eliminating the need for hyperparameter tuning and large batch sizes in nonconvex stochastic optimization.
Findings
Achieves the optimal $O(1/T^{1/3})$ convergence rate.
Does not require knowledge of smoothness or large batch sizes.
Provides a practical, fully adaptive optimization method.
Abstract
In this work we investigate stochastic non-convex optimization problems where the objective is an expectation over smooth loss functions, and the goal is to find an approximate stationary point. The most popular approach to handling such problems is variance reduction techniques, which are also known to obtain tight convergence rates, matching the lower bounds in this case. Nevertheless, these techniques require a careful maintenance of anchor points in conjunction with appropriately selected "mega-batchsizes". This leads to a challenging hyperparameter tuning problem, that weakens their practicality. Recently, [Cutkosky and Orabona, 2019] have shown that one can employ recursive momentum in order to avoid the use of anchor points and large batchsizes, and still obtain the optimal rate for this setting. Yet, their method called STORM crucially relies on the knowledge of the smoothness,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Advanced Bandit Algorithms Research
