The Role of Momentum Parameters in the Optimal Convergence of Adaptive   Polyak's Heavy-ball Methods

Wei Tao; Sheng Long; Gaowei Wu; Qing Tao

arXiv:2102.07314·cs.LG·February 16, 2021·6 cites

The Role of Momentum Parameters in the Optimal Convergence of Adaptive Polyak's Heavy-ball Methods

Wei Tao, Sheng Long, Gaowei Wu, Qing Tao

PDF

Open Access 1 Video

TL;DR

This paper analyzes the convergence of adaptive Polyak's Heavy-ball method, showing it achieves optimal last-iterate convergence rates in convex optimization, with implications for deep learning.

Contribution

It provides the first theoretical proof of optimal individual convergence rates for adaptive Heavy-ball methods in convex settings, bridging theory and practice.

Findings

01

Achieves $O(1/\sqrt{t})$ convergence rate for last iterate.

02

Demonstrates the importance of momentum scheduling in deep learning.

03

Empirical validation on convex functions and deep networks.

Abstract

The adaptive stochastic gradient descent (SGD) with momentum has been widely adopted in deep learning as well as convex optimization. In practice, the last iterate is commonly used as the final solution to make decisions. However, the available regret analysis and the setting of constant momentum parameters only guarantee the optimal convergence of the averaged solution. In this paper, we fill this theory-practice gap by investigating the convergence of the last iterate (referred to as individual convergence), which is a more difficult task than convergence analysis of the averaged solution. Specifically, in the constrained convex cases, we prove that the adaptive Polyak's Heavy-ball (HB) method, in which only the step size is updated using the exponential moving average strategy, attains an optimal individual convergence rate of $O (\frac{1}{t})$ , as opposed to the optimality of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

The Role of Momentum Parameters in the Optimal Convergence of Adaptive Polyak's Heavy-ball Methods· slideslive

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Advanced Bandit Algorithms Research

MethodsStochastic Gradient Descent