Variance Reduction on General Adaptive Stochastic Mirror Descent

Wenjie Li; Zhanyu Wang; Yichen Zhang; Guang Cheng

arXiv:2012.13760·stat.ML·October 18, 2022·1 cites

Variance Reduction on General Adaptive Stochastic Mirror Descent

Wenjie Li, Zhanyu Wang, Yichen Zhang, Guang Cheng

PDF

Open Access

TL;DR

This paper introduces SVRAMD, a variance reduction framework for adaptive mirror descent algorithms, improving convergence rates in nonsmooth nonconvex optimization and validating results through deep learning experiments.

Contribution

It proposes a generalized variance reduction framework for adaptive mirror descent, applicable to various algorithms including AdaGrad and RMSProp, with proven convergence improvements.

Findings

01

Variance reduction decreases SFO complexity and accelerates convergence.

02

SVRAMD achieves the best known rates for non-adaptive algorithms.

03

Experimental results in deep learning support theoretical claims.

Abstract

In this work, we investigate the idea of variance reduction by studying its properties with general adaptive mirror descent algorithms in nonsmooth nonconvex finite-sum optimization problems. We propose a simple yet generalized framework for variance reduced adaptive mirror descent algorithms named SVRAMD and provide its convergence analysis in both the nonsmooth nonconvex problem and the P-L conditioned problem. We prove that variance reduction reduces the SFO complexity of adaptive mirror descent algorithms and thus accelerates their convergence. In particular, our general theory implies that variance reduction can be applied to algorithms using time-varying step sizes and self-adaptive algorithms such as AdaGrad and RMSProp. Moreover, the convergence rates of SVRAMD recover the best existing rates of non-adaptive variance reduced mirror descent algorithms without complicated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and ELM

MethodsAdaGrad · RMSProp