New Perspectives on the Polyak Stepsize: Surrogate Functions and Negative Results
Francesco Orabona, Ryan D'Orazio

TL;DR
This paper offers a unified perspective on the Polyak stepsize by viewing it as gradient descent on a surrogate loss, clarifying its convergence properties and limitations across various variants.
Contribution
It introduces a simple, unified framework for analyzing Polyak stepsize variants as surrogate loss minimization, revealing their convergence behavior and limitations.
Findings
Unified analysis of Polyak variants across assumptions
Negative results confirming non-convergence in some cases
Insight into local curvature adaptation of stepsizes
Abstract
The Polyak stepsize has been proven to be a fundamental stepsize in convex optimization, giving near optimal gradient descent rates across a wide range of assumptions. The universality of the Polyak stepsize has also inspired many stochastic variants, with theoretical guarantees and strong empirical performance. Despite the many theoretical results, our understanding of the convergence properties and shortcomings of the Polyak stepsize or its variants is both incomplete and fractured across different analyses. We propose a new, unified, and simple perspective for the Polyak stepsize and its variants as gradient descent on a surrogate loss. We show that each variant is equivalent to minimize a surrogate function with stepsizes that adapt to a guaranteed local curvature. Our general surrogate loss perspective is then used to provide a unified analysis of existing variants across different…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMatrix Theory and Algorithms · Mathematical Approximation and Integration · Probability and Risk Models
