Stochastic Gradient Descent Revisited

Azar Louzi

arXiv:2412.06070·math.OC·March 11, 2025

Stochastic Gradient Descent Revisited

Azar Louzi

PDF

TL;DR

This paper provides a comprehensive convergence analysis of biased nonconvex stochastic gradient descent, covering various convergence types and rates under mild assumptions, enhancing theoretical understanding of SGD in machine learning.

Contribution

It offers a full scope convergence study of biased nonconvex SGD, including weak, function-value, and global convergence, with convergence rates and complexities under mild conditions.

Findings

01

Established weak convergence of biased nonconvex SGD

02

Derived convergence rates and complexities

03

Provided conditions under which convergence guarantees hold

Abstract

Stochastic gradient descent (SGD) has been a go-to algorithm for nonconvex stochastic optimization problems arising in machine learning. Its theory however often requires a strong framework to guarantee convergence properties. We hereby present a full scope convergence study of biased nonconvex SGD, including weak convergence, function-value convergence and global convergence, and also provide subsequent convergence rates and complexities, all under relatively mild conditions in comparison with literature.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsStochastic Gradient Descent