High-Probability Bounds for Stochastic Optimization and Variational   Inequalities: the Case of Unbounded Variance

Abdurakhmon Sadiev; Marina Danilova; Eduard Gorbunov; Samuel; Horv\'ath; Gauthier Gidel; Pavel Dvurechensky; Alexander Gasnikov; Peter; Richt\'arik

arXiv:2302.00999·math.OC·July 19, 2023·5 cites

High-Probability Bounds for Stochastic Optimization and Variational Inequalities: the Case of Unbounded Variance

Abdurakhmon Sadiev, Marina Danilova, Eduard Gorbunov, Samuel, Horv\'ath, Gauthier Gidel, Pavel Dvurechensky, Alexander Gasnikov, Peter, Richt\'arik

PDF

Open Access 1 Video

TL;DR

This paper develops high-probability convergence bounds for stochastic optimization and variational inequalities under less restrictive assumptions, specifically bounded central moments of gradient noise, applicable to various problem classes.

Contribution

It introduces algorithms with high-probability guarantees assuming only bounded central moments of noise, extending applicability beyond traditional bounded variance assumptions.

Findings

01

Derived high-probability bounds under bounded central α-th moments for α in (1,2].

02

Applicable to a wide range of problem classes including non-convex, convex, and variational inequalities.

03

Justifies use of methods in settings with unbounded variance or heavy-tailed noise.

Abstract

During recent years the interest of optimization and machine learning communities in high-probability convergence of stochastic optimization methods has been growing. One of the main reasons for this is that high-probability complexity bounds are more accurate and less studied than in-expectation ones. However, SOTA high-probability non-asymptotic convergence results are derived under strong assumptions such as the boundedness of the gradient noise variance or of the objective's gradient itself. In this paper, we propose several algorithms with high-probability convergence results under less restrictive assumptions. In particular, we derive new high-probability convergence results under the assumption that the gradient/operator noise has bounded central $α$ -th moment for $α \in (1, 2]$ in the following setups: (i) smooth non-convex / Polyak-Lojasiewicz / convex / strongly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

High-Probability Bounds for Stochastic Optimization and Variational Inequalities: the Case of Unbounded Variance· slideslive

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Optimization and Variational Analysis