Stochastic Polyak Step-sizes and Momentum: Convergence Guarantees and Practical Performance
Dimitris Oikonomou, Nicolas Loizou

TL;DR
This paper introduces new adaptive Polyak step-size variants for stochastic heavy ball methods, providing convergence guarantees and demonstrating improved practical performance in large-scale stochastic optimization tasks.
Contribution
It proposes three novel Polyak-type step-size methods for SHB, with convergence guarantees and adaptability without prior parameter knowledge or interpolation assumptions.
Findings
Convergence to a neighborhood of the solution for convex problems.
Fast convergence to the true solution under interpolation.
Experimental validation showing robustness and effectiveness.
Abstract
Stochastic gradient descent with momentum, also known as Stochastic Heavy Ball method (SHB), is one of the most popular algorithms for solving large-scale stochastic optimization problems in various machine learning tasks. In practical scenarios, tuning the step-size and momentum parameters of the method is a prohibitively expensive and time-consuming process. In this work, inspired by the recent advantages of stochastic Polyak step-size in the performance of stochastic gradient descent (SGD), we propose and explore new Polyak-type variants suitable for the update rule of the SHB method. In particular, using the Iterate Moving Average (IMA) viewpoint of SHB, we propose and analyze three novel step-size selections: MomSPS, MomDecSPS, and MomAdaSPS. For MomSPS, we provide convergence guarantees for SHB to a neighborhood of the solution for convex and smooth problems…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsStochastic processes and statistical mechanics · Markov Chains and Monte Carlo Methods · Random Matrices and Applications
MethodsStochastic Gradient Descent
