The Role of Momentum Parameters in the Optimal Convergence of Adaptive Polyak's Heavy-ball Methods
Wei Tao, Sheng Long, Gaowei Wu, Qing Tao

TL;DR
This paper analyzes the convergence of adaptive Polyak's Heavy-ball method, showing it achieves optimal last-iterate convergence rates in convex optimization, with implications for deep learning.
Contribution
It provides the first theoretical proof of optimal individual convergence rates for adaptive Heavy-ball methods in convex settings, bridging theory and practice.
Findings
Achieves $O(1/\sqrt{t})$ convergence rate for last iterate.
Demonstrates the importance of momentum scheduling in deep learning.
Empirical validation on convex functions and deep networks.
Abstract
The adaptive stochastic gradient descent (SGD) with momentum has been widely adopted in deep learning as well as convex optimization. In practice, the last iterate is commonly used as the final solution to make decisions. However, the available regret analysis and the setting of constant momentum parameters only guarantee the optimal convergence of the averaged solution. In this paper, we fill this theory-practice gap by investigating the convergence of the last iterate (referred to as individual convergence), which is a more difficult task than convergence analysis of the averaged solution. Specifically, in the constrained convex cases, we prove that the adaptive Polyak's Heavy-ball (HB) method, in which only the step size is updated using the exponential moving average strategy, attains an optimal individual convergence rate of , as opposed to the optimality of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Advanced Bandit Algorithms Research
MethodsStochastic Gradient Descent
