Near-Optimal Non-Convex Stochastic Optimization under Generalized   Smoothness

Zijian Liu; Srikanth Jagabathula; Zhengyuan Zhou

arXiv:2302.06032·cs.LG·October 31, 2023

Near-Optimal Non-Convex Stochastic Optimization under Generalized Smoothness

Zijian Liu, Srikanth Jagabathula, Zhengyuan Zhou

PDF

Open Access

TL;DR

This paper introduces a new analysis of a simple variant of the STORM algorithm for generalized smooth non-convex stochastic optimization, achieving near-optimal high-probability and expected convergence guarantees with constant batch size.

Contribution

It provides the first near-optimal high-probability sample complexity for generalized smoothness and improves expected convergence bounds, all with constant batch size requirements.

Findings

01

Achieves $O( ext{log}(1/(\delta ext{,}\epsilon)) ext{ extasciicircum}3)$ high-probability sample complexity.

02

Recovers the optimal $O( ext{ extasciicircum}3)$ expected sample complexity.

03

Requires only a constant batch size, unlike previous methods.

Abstract

The generalized smooth condition, $(L_{0}, L_{1})$ -smoothness, has triggered people's interest since it is more realistic in many optimization problems shown by both empirical and theoretical evidence. Two recent works established the $O (ϵ^{- 3})$ sample complexity to obtain an $O (ϵ)$ -stationary point. However, both require a large batch size on the order of $ploy (ϵ^{- 1})$ , which is not only computationally burdensome but also unsuitable for streaming applications. Additionally, these existing convergence bounds are established only for the expected rate, which is inadequate as they do not supply a useful performance guarantee on a single run. In this work, we solve the prior two problems simultaneously by revisiting a simple variant of the STORM algorithm. Specifically, under the $(L_{0}, L_{1})$ -smoothness and affine-type noises, we establish the first…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Advanced Bandit Algorithms Research