Convergence of Momentum-Based Optimization Algorithms with Time-Varying Parameters

Mathukumalli Vidyasagar

arXiv:2506.11904·math.OC·September 10, 2025

Convergence of Momentum-Based Optimization Algorithms with Time-Varying Parameters

Mathukumalli Vidyasagar

PDF

Open Access

TL;DR

This paper introduces a unified stochastic optimization algorithm with time-varying momentum, encompassing existing methods like SHB and SNAG, and establishes convergence conditions under very general assumptions on the stochastic gradient.

Contribution

It provides a general convergence analysis for a unified momentum-based stochastic optimization algorithm with time-varying parameters under broad gradient assumptions.

Findings

01

Includes SHB and SNAG as special cases.

02

Provides natural convergence conditions generalizing classical stochastic approximation.

03

Shows impracticality of a certain existing method with time-varying momentum.

Abstract

In this paper, we present a unified algorithm for stochastic optimization that makes use of a "momentum" term; in other words, the stochastic gradient depends not only on the current true gradient of the objective function, but also on the true gradient at the previous iteration. Our formulation includes the Stochastic Heavy Ball (SHB) and the Stochastic Nesterov Accelerated Gradient (SNAG) algorithms as special cases. In addition, in our formulation, the momentum term is allowed to vary as a function of time (i.e., the iteration counter). The assumptions on the stochastic gradient are the most general in the literature, in that it can be biased, and have a conditional variance that grows in an unbounded fashion as a function of time. This last feature is crucial in order to make the theory applicable to "zero-order" methods, where the gradient is estimated using just two function…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Optimization Algorithms Research · Matrix Theory and Algorithms

MethodsNesterov Accelerated Gradient · Sparse Evolutionary Training