Convergence of SGD with momentum in the nonconvex case: A time window-based analysis
Junwen Qiu, Bohao Ma, Andre Milzarek

TL;DR
This paper analyzes the convergence of stochastic gradient descent with momentum (SGDM) in nonconvex optimization by introducing a novel time window-based approach, establishing convergence results under the { ext{ extlbrackdbl}}Lojasiewicz{ ext{ extbrackdbl}} property.
Contribution
It introduces a new time window-based analysis method for SGDM in nonconvex settings, providing convergence results and local rates under the { ext{ extlbrackdbl}}Lojasiewicz{ ext{ extbrackdbl}} property.
Findings
Established iterate convergence of SGDM under { ext{ extlbrackdbl}}Lojasiewicz{ ext{ extbrackdbl}} conditions.
Derived local convergence rates depending on the { ext{ extlbrackdbl}}Lojasiewicz{ ext{ extbrackdbl}} exponent.
Provided analysis applicable to large-scale stochastic optimization problems.
Abstract
The stochastic gradient descent method with momentum (SGDM) is a common approach for solving large-scale and stochastic optimization problems. Despite its popularity, the convergence behavior of SGDM remains less understood in nonconvex scenarios. This is primarily due to the absence of a sufficient descent property and challenges in simultaneously controlling the momentum and stochastic errors in an almost sure sense. To address these challenges, we investigate the behavior of SGDM over specific time windows, rather than examining the descent of consecutive iterates as in traditional studies. This time window-based approach simplifies the convergence analysis and enables us to establish the iterate convergence result for SGDM under the {\L}ojasiewicz property. We further provide local convergence rates which depend on the underlying {\L}ojasiewicz exponent and the utilized step size…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMatrix Theory and Algorithms · Sparse and Compressive Sensing Techniques · Advanced Thermodynamics and Statistical Mechanics
