META-STORM: Generalized Fully-Adaptive Variance Reduced SGD for Unbounded Functions
Zijian Liu, Ta Duy Nguyen, Thien Hang Nguyen, Alina Ene, Huy L. Nguyen

TL;DR
META-STORM introduces a fully adaptive variance reduction algorithm for non-convex stochastic optimization that removes the bounded function value assumption, improves convergence rates, and is effective in deep learning applications.
Contribution
It generalizes STORM+ by removing the bounded function assumption while maintaining optimal convergence and full adaptivity, enhancing flexibility and performance.
Findings
META-STORM achieves optimal convergence rates without bounded function assumptions.
The method outperforms previous VR algorithms in deep learning tasks.
META-STORM is competitive with widely used algorithms with added heuristics.
Abstract
We study the application of variance reduction (VR) techniques to general non-convex stochastic optimization problems. In this setting, the recent work STORM [Cutkosky-Orabona '19] overcomes the drawback of having to compute gradients of "mega-batches" that earlier VR methods rely on. There, STORM utilizes recursive momentum to achieve the VR effect and is then later made fully adaptive in STORM+ [Levy et al., '21], where full-adaptivity removes the requirement for obtaining certain problem-specific parameters such as the smoothness of the objective and bounds on the variance and norm of the stochastic gradients in order to set the step size. However, STORM+ crucially relies on the assumption that the function values are bounded, excluding a large class of useful functions. In this work, we propose META-STORM, a generalized framework of STORM+ that removes this bounded function values…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Domain Adaptation and Few-Shot Learning
