Sharp High-Probability Rates for Nonlinear SGD under Heavy-Tailed Noise via Symmetrization

Aleksandar Armacki; Dragana Bajovic; Dusan Jakovetic; Soummya Kar

arXiv:2507.09093·stat.ML·February 11, 2026

Sharp High-Probability Rates for Nonlinear SGD under Heavy-Tailed Noise via Symmetrization

Aleksandar Armacki, Dragana Bajovic, Dusan Jakovetic, Soummya Kar

PDF

TL;DR

This paper introduces a nonlinear stochastic gradient descent framework that achieves high-probability convergence rates under heavy-tailed, possibly non-symmetric noise, using symmetrization techniques for improved robustness and theoretical guarantees.

Contribution

It develops a unified nonlinear SGD framework with novel symmetrization-based estimators, achieving optimal convergence rates under relaxed heavy-tailed noise conditions, including non-symmetric cases.

Findings

01

N-SGD attains (t^{-1/2}) rate with exponential tail decay.

02

Symmetrized estimators handle non-symmetric heavy-tailed noise effectively.

03

The framework improves convergence guarantees compared to prior bounded-moment assumptions.

Abstract

We study convergence in high-probability of SGD-type methods in non-convex optimization and the presence of heavy-tailed noise. To combat the heavy-tailed noise, a general black-box nonlinear framework is considered, subsuming nonlinearities like sign, clipping, normalization and their smooth counterparts. Our first result shows that nonlinear SGD (N-SGD) achieves the rate $O (t^{- 1/2})$ , for any noise with unbounded moments and a symmetric probability density function (PDF). Crucially, N-SGD has exponentially decaying tails, matching the performance of linear SGD under light-tailed noise. To handle non-symmetric noise, we propose two novel estimators, based on the idea of noise symmetrization. The first, dubbed Symmetrized Gradient Estimator (SGE), assumes a noiseless gradient at any reference point is available at the start of training, while the second, dubbed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.