Revisiting Stochastic Approximation and Stochastic Gradient Descent

Rajeeva Laxman Karandikar; Bhamidi Visweswara Rao; Mathukumalli Vidyasagar

arXiv:2505.11343·math.OC·November 11, 2025

Revisiting Stochastic Approximation and Stochastic Gradient Descent

Rajeeva Laxman Karandikar, Bhamidi Visweswara Rao, Mathukumalli Vidyasagar

PDF

TL;DR

This paper introduces a new convergence proof approach for SA and SGD algorithms based on GSLLN, allowing broader noise conditions and extending applicability beyond traditional methods.

Contribution

The paper presents a novel proof technique for SA and SGD convergence using GSLLN, accommodating more general noise conditions and zero-order methods.

Findings

01

Allows noise with infinite variance or mean.

02

Provides the weakest convergence conditions to date.

03

Extends convergence analysis to zero-order SGD.

Abstract

In this paper, we introduce a new approach to proving the convergence of the Stochastic Approximation (SA) and the Stochastic Gradient Descent (SGD) algorithms. The new approach is based on a concept called GSLLN (Generalized Strong Law of Large Numbers), which extends the traditional SLLN. Using this concept, we provide sufficient conditions for convergence, which effectively decouple the properties of the function whose zero we are trying to find, from the properties of the measurement errors (noise sequence). The new approach provides an alternative to the two widely used approaches, namely the ODE approach and the martingale approach, and also permits a wider class of noise signals than either of the two known approaches. In particular, the ``noise'' or measurement error \textit{need not} have a finite second moment, and under suitable conditions, not even a finite mean. By adapting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsStochastic Gradient Descent