Tight Lower Bounds and Optimal Algorithms for Stochastic Nonconvex Optimization with Heavy-Tailed Noise

Adrien Fradin; Abdurakhmon Sadiev; Laurent Condat; Peter Richt\'arik

arXiv:2512.18713·math.OC·April 1, 2026

Tight Lower Bounds and Optimal Algorithms for Stochastic Nonconvex Optimization with Heavy-Tailed Noise

Adrien Fradin, Abdurakhmon Sadiev, Laurent Condat, Peter Richt\'arik

PDF

TL;DR

This paper establishes tight lower bounds and develops optimal algorithms for stochastic nonconvex optimization with heavy-tailed noise, broadening the understanding of complexity in such challenging settings.

Contribution

It extends lower bounds to broader assumptions and introduces algorithms that match these bounds, including high-probability guarantees under weaker conditions.

Findings

01

Normalized Stochastic Gradient Descent with Momentum Variance Reduction matches the lower bounds.

02

Double-Clipped NSGD-MVR achieves high-probability convergence under weaker assumptions.

03

New sharper lower bounds for second-order methods improve previous results.

Abstract

We study stochastic nonconvex optimization under heavy-tailed noise. In this setting, the stochastic gradients only have bounded $p$ -th central moment ( $p$ -BCM) for some $p \in (1, 2]$ . Building on the foundational work of Arjevani et al. (2022) in stochastic optimization, we establish tight sample complexity lower bounds for all first-order methods under \emph{relaxed} mean-squared smoothness ( $q$ -WAS) and $δ$ -similarity ( $(q, δ)$ -S) assumptions, allowing any exponent $q \in [1, 2]$ instead of the standard $q = 2$ . These results substantially broaden the scope of existing lower bounds. To complement them, we show that Normalized Stochastic Gradient Descent with Momentum Variance Reduction (NSGD-MVR), a known algorithm, matches these bounds in expectation. Beyond expectation guarantees, we introduce a new algorithm, Double-Clipped NSGD-MVR, which allows the derivation of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.