Adaptive Variance Reduction for Stochastic Optimization under Weaker   Assumptions

Wei Jiang; Sifan Yang; Yibo Wang; Lijun Zhang

arXiv:2406.01959·math.OC·October 24, 2024

Adaptive Variance Reduction for Stochastic Optimization under Weaker Assumptions

Wei Jiang, Sifan Yang, Yibo Wang, Lijun Zhang

PDF

Open Access 1 Video

TL;DR

This paper introduces a new adaptive variance reduction method for stochastic optimization that achieves optimal convergence rates under weaker assumptions, extending to compositional and finite-sum problems.

Contribution

The paper proposes a novel adaptive STORM algorithm with weaker assumptions and optimal convergence rates, also extending to compositional and finite-sum optimization.

Findings

01

Achieves $oldsymbol{ ext{O}(T^{-1/3})}$ convergence for non-convex functions.

02

Requires weaker assumptions than previous methods.

03

Validates effectiveness through numerical experiments.

Abstract

This paper explores adaptive variance reduction methods for stochastic optimization based on the STORM technique. Existing adaptive extensions of STORM rely on strong assumptions like bounded gradients and bounded function values, or suffer an additional $O (lo g T)$ term in the convergence rate. To address these limitations, we introduce a novel adaptive STORM method that achieves an optimal convergence rate of $O (T^{- 1/3})$ for non-convex functions with our newly designed learning rate strategy. Compared with existing approaches, our method requires weaker assumptions and attains the optimal convergence rate without the additional $O (lo g T)$ term. We also extend the proposed technique to stochastic compositional optimization, obtaining the same optimal rate of $O (T^{- 1/3})$ . Furthermore, we investigate the non-convex finite-sum problem and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Adaptive Variance Reduction for Stochastic Optimization under Weaker Assumptions· slideslive

Taxonomy

TopicsNeural Networks and Applications