Tight Lower Bounds and Optimal Algorithms for Stochastic Nonconvex Optimization with Heavy-Tailed Noise
Adrien Fradin, Abdurakhmon Sadiev, Laurent Condat, Peter Richt\'arik

TL;DR
This paper establishes tight lower bounds and develops optimal algorithms for stochastic nonconvex optimization with heavy-tailed noise, broadening the understanding of complexity in such challenging settings.
Contribution
It extends lower bounds to broader assumptions and introduces algorithms that match these bounds, including high-probability guarantees under weaker conditions.
Findings
Normalized Stochastic Gradient Descent with Momentum Variance Reduction matches the lower bounds.
Double-Clipped NSGD-MVR achieves high-probability convergence under weaker assumptions.
New sharper lower bounds for second-order methods improve previous results.
Abstract
We study stochastic nonconvex optimization under heavy-tailed noise. In this setting, the stochastic gradients only have bounded -th central moment (-BCM) for some . Building on the foundational work of Arjevani et al. (2022) in stochastic optimization, we establish tight sample complexity lower bounds for all first-order methods under \emph{relaxed} mean-squared smoothness (-WAS) and -similarity (-S) assumptions, allowing any exponent instead of the standard . These results substantially broaden the scope of existing lower bounds. To complement them, we show that Normalized Stochastic Gradient Descent with Momentum Variance Reduction (NSGD-MVR), a known algorithm, matches these bounds in expectation. Beyond expectation guarantees, we introduce a new algorithm, Double-Clipped NSGD-MVR, which allows the derivation of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
