META-STORM: Generalized Fully-Adaptive Variance Reduced SGD for   Unbounded Functions

Zijian Liu; Ta Duy Nguyen; Thien Hang Nguyen; Alina Ene; Huy L. Nguyen

arXiv:2209.14853·cs.LG·September 30, 2022

META-STORM: Generalized Fully-Adaptive Variance Reduced SGD for Unbounded Functions

Zijian Liu, Ta Duy Nguyen, Thien Hang Nguyen, Alina Ene, Huy L. Nguyen

PDF

Open Access

TL;DR

META-STORM introduces a fully adaptive variance reduction algorithm for non-convex stochastic optimization that removes the bounded function value assumption, improves convergence rates, and is effective in deep learning applications.

Contribution

It generalizes STORM+ by removing the bounded function assumption while maintaining optimal convergence and full adaptivity, enhancing flexibility and performance.

Findings

01

META-STORM achieves optimal convergence rates without bounded function assumptions.

02

The method outperforms previous VR algorithms in deep learning tasks.

03

META-STORM is competitive with widely used algorithms with added heuristics.

Abstract

We study the application of variance reduction (VR) techniques to general non-convex stochastic optimization problems. In this setting, the recent work STORM [Cutkosky-Orabona '19] overcomes the drawback of having to compute gradients of "mega-batches" that earlier VR methods rely on. There, STORM utilizes recursive momentum to achieve the VR effect and is then later made fully adaptive in STORM+ [Levy et al., '21], where full-adaptivity removes the requirement for obtaining certain problem-specific parameters such as the smoothness of the objective and bounds on the variance and norm of the stochastic gradients in order to set the step size. However, STORM+ crucially relies on the assumption that the function values are bounded, excluding a large class of useful functions. In this work, we propose META-STORM, a generalized framework of STORM+ that removes this bounded function values…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Domain Adaptation and Few-Shot Learning