Generalized Stochastic Gradient Descent with Momentum Methods for Smooth Optimization
Zimeng Wang, Alp Yurtsever

TL;DR
This paper introduces a unified theoretical framework for stochastic gradient descent with momentum, providing convergence guarantees for a wide range of momentum-based optimization methods in both convex and nonconvex settings.
Contribution
It unifies various momentum methods under a generalized framework and offers comprehensive convergence analyses with flexible parameter choices.
Findings
Established ergodic convergence for convex problems with constant parameters.
Derived improved convergence rates with time-varying parameters.
Proved sublinear convergence to stationary points in nonconvex problems.
Abstract
Stochastic gradient descent with momentum (SGDM) methods have become fundamental optimization tools in machine learning, combining the computational efficiency of stochastic gradients with the acceleration benefits of momentum. Despite their widespread use in practice, the theoretical understanding of SGDM remains incomplete, with most existing analyses focusing on specific momentum schemes or requiring restrictive assumptions. In this paper, we introduce a generalized SGDM framework that unifies a broad class of momentum-based methods, including SGD with Polyak's momentum, SGD with Nesterov's momentum, and many others. We provide comprehensive convergence analyses for both convex and nonconvex optimization problems under mild smoothness and bounded variance assumptions. For convex problems, we establish general ergodic convergence results with constant parameters and derive improved…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Markov Chains and Monte Carlo Methods
