Breaking the Lower Bound with (Little) Structure: Acceleration in   Non-Convex Stochastic Optimization with Heavy-Tailed Noise

Zijian Liu; Jiawei Zhang; Zhengyuan Zhou

arXiv:2302.06763·cs.LG·September 6, 2023

Breaking the Lower Bound with (Little) Structure: Acceleration in Non-Convex Stochastic Optimization with Heavy-Tailed Noise

Zijian Liu, Jiawei Zhang, Zhengyuan Zhou

PDF

Open Access

TL;DR

This paper improves convergence guarantees for heavy-tailed stochastic optimization, showing that with minimal structure, faster rates than the known lower bounds are achievable using a new variance-reduced accelerated algorithm.

Contribution

It introduces a variance-reduced accelerated algorithm for structured stochastic optimization, surpassing existing lower bounds under mild assumptions.

Findings

01

Achieves nearly optimal high-probability convergence without restrictive assumptions.

02

Demonstrates faster convergence rate with minimal problem structure.

03

Yields near-optimal rates even in finite-variance scenarios.

Abstract

We consider the stochastic optimization problem with smooth but not necessarily convex objectives in the heavy-tailed noise regime, where the stochastic gradient's noise is assumed to have bounded $p$ th moment ( $p \in (1, 2]$ ). Zhang et al. (2020) is the first to prove the $Ω (T^{\frac{1 - p}{3 p - 2}})$ lower bound for convergence (in expectation) and provides a simple clipping algorithm that matches this optimal rate. Cutkosky and Mehta (2021) proposes another algorithm, which is shown to achieve the nearly optimal high-probability convergence guarantee $O (lo g (T / δ) T^{\frac{1 - p}{3 p - 2}})$ , where $δ$ is the probability of failure. However, this desirable guarantee is only established under the additional assumption that the stochastic gradient itself is bounded in $p$ th moment, which fails to hold even for quadratic objectives and centered Gaussian noise. In this work, we first…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Statistical Methods and Inference · Markov Chains and Monte Carlo Methods