Beyond Bounded Variance: Variance-Reduced Normalized Methods for Nonconvex Optimization under Blum-Gladyshev Noise

Antesh Upadhyay; Arda Fazla; Abolfazl Hashemi

arXiv:2605.15314·cs.LG·May 18, 2026

Beyond Bounded Variance: Variance-Reduced Normalized Methods for Nonconvex Optimization under Blum-Gladyshev Noise

Antesh Upadhyay, Arda Fazla, Abolfazl Hashemi

PDF

TL;DR

This paper introduces variance-reduced normalized methods for nonconvex stochastic optimization under Blum-Gladyshev noise, providing convergence guarantees and optimal complexity bounds in both standard and generalized smoothness settings.

Contribution

It presents the first convergence guarantees for normalized methods under BG-0 noise without bounded domains or increased batch sizes, covering both smoothness regimes.

Findings

01

Normalized SGD with momentum converges with $O( ext{epsilon}^{-6})$ complexity.

02

Variance-reduced normalized STORM achieves minimax optimal $O( ext{epsilon}^{-4})$ complexity.

03

Results recover standard rates when noise parameters vanish.

Abstract

We study nonconvex stochastic optimization under the Blum-Gladyshev ( $BG$ -0) noise model, where the stochastic gradient variance grows quadratically with the distance from the initialization. We consider this problem under both standard smoothness and the symmetric generalized-smoothness framework, which captures objectives whose local curvature can scale with the gradient norm. We prove that normalized stochastic gradient descent with momentum, using only one stochastic gradient per iteration, converges under $BG$ -0 noise with oracle complexity $O (ε^{- 6})$ . This rate holds both for standard smoothness and for $α$ -symmetric generalized smoothness, showing that generalized smoothness is rate-neutral for normalized momentum in this setting. We then study a variance-reduced normalized STORM method. Under mean-square smoothness and sharp initialization, the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.