Convergence rates of stochastic gradient method with independent sequences of step-size and momentum weight
Wen-Liang Hwang

TL;DR
This paper analyzes the convergence rates of stochastic gradient methods with independent step-size and momentum weight sequences, providing theoretical insights into their behavior under different settings in large-scale learning.
Contribution
It offers a theoretical convergence analysis for stochastic gradient methods with independent step-size and momentum sequences, including conditions for convergence and practical justification.
Findings
Convergence rate is exponential in step-size and polynomial in momentum weight for diminishing-to-zero step-size.
Default momentum and diminishing step-size sequences are justified for large-scale learning.
Conditions for convergence of momentum weight in stage-wise constant step-size are provided.
Abstract
In large-scale learning algorithms, the momentum term is usually included in the stochastic sub-gradient method to improve the learning speed because it can navigate ravines efficiently to reach a local minimum. However, step-size and momentum weight hyper-parameters must be appropriately tuned to optimize convergence. We thus analyze the convergence rate using stochastic programming with Polyak's acceleration of two commonly used step-size learning rates: ``diminishing-to-zero" and ``constant-and-drop" (where the sequence is divided into stages and a constant step-size is applied at each stage) under strongly convex functions over a compact convex set with bounded sub-gradients. For the former, we show that the convergence rate can be written as a product of exponential in step-size and polynomial in momentum weight. Our analysis justifies the convergence of using the default momentum…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques
MethodsSparse Evolutionary Training · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
