Beyond Bounded Variance: Variance-Reduced Normalized Methods for Nonconvex Optimization under Blum-Gladyshev Noise
Antesh Upadhyay, Arda Fazla, Abolfazl Hashemi

TL;DR
This paper introduces variance-reduced normalized methods for nonconvex stochastic optimization under Blum-Gladyshev noise, providing convergence guarantees and optimal complexity bounds in both standard and generalized smoothness settings.
Contribution
It presents the first convergence guarantees for normalized methods under BG-0 noise without bounded domains or increased batch sizes, covering both smoothness regimes.
Findings
Normalized SGD with momentum converges with $O( ext{epsilon}^{-6})$ complexity.
Variance-reduced normalized STORM achieves minimax optimal $O( ext{epsilon}^{-4})$ complexity.
Results recover standard rates when noise parameters vanish.
Abstract
We study nonconvex stochastic optimization under the Blum-Gladyshev (-0) noise model, where the stochastic gradient variance grows quadratically with the distance from the initialization. We consider this problem under both standard smoothness and the symmetric generalized-smoothness framework, which captures objectives whose local curvature can scale with the gradient norm. We prove that normalized stochastic gradient descent with momentum, using only one stochastic gradient per iteration, converges under -0 noise with oracle complexity . This rate holds both for standard smoothness and for -symmetric generalized smoothness, showing that generalized smoothness is rate-neutral for normalized momentum in this setting. We then study a variance-reduced normalized STORM method. Under mean-square smoothness and sharp initialization, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
