Convergence rates of stochastic gradient method with independent   sequences of step-size and momentum weight

Wen-Liang Hwang

arXiv:2408.02678·cs.LG·August 7, 2024

Convergence rates of stochastic gradient method with independent sequences of step-size and momentum weight

Wen-Liang Hwang

PDF

Open Access

TL;DR

This paper analyzes the convergence rates of stochastic gradient methods with independent step-size and momentum weight sequences, providing theoretical insights into their behavior under different settings in large-scale learning.

Contribution

It offers a theoretical convergence analysis for stochastic gradient methods with independent step-size and momentum sequences, including conditions for convergence and practical justification.

Findings

01

Convergence rate is exponential in step-size and polynomial in momentum weight for diminishing-to-zero step-size.

02

Default momentum and diminishing step-size sequences are justified for large-scale learning.

03

Conditions for convergence of momentum weight in stage-wise constant step-size are provided.

Abstract

In large-scale learning algorithms, the momentum term is usually included in the stochastic sub-gradient method to improve the learning speed because it can navigate ravines efficiently to reach a local minimum. However, step-size and momentum weight hyper-parameters must be appropriately tuned to optimize convergence. We thus analyze the convergence rate using stochastic programming with Polyak's acceleration of two commonly used step-size learning rates: ``diminishing-to-zero" and ``constant-and-drop" (where the sequence is divided into stages and a constant step-size is applied at each stage) under strongly convex functions over a compact convex set with bounded sub-gradients. For the former, we show that the convergence rate can be written as a product of exponential in step-size and polynomial in momentum weight. Our analysis justifies the convergence of using the default momentum…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques

MethodsSparse Evolutionary Training · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings