New Results on the Polyak Stepsize: Tight Convergence Analysis and Universal Function Classes
Chang He, Wenzhi Gao, Bo Jiang, Madeleine Udell, Shuzhong Zhang

TL;DR
This paper provides a tight convergence analysis of the Polyak stepsize in gradient descent, demonstrating its optimality and universality across different function classes, including smooth and H"older conditions.
Contribution
We establish the tightness of PolyakGD's known convergence rates and prove its universal adaptation to various function classes without prior parameter knowledge.
Findings
Convergence rates for smooth strongly convex functions are tight.
PolyakGD exploits floating-point errors to escape worst-case scenarios.
PolyakGD adapts automatically to different function classes.
Abstract
In this paper, we revisit a classical adaptive stepsize strategy for gradient descent: the Polyak stepsize (PolyakGD), originally proposed in Polyak (1969). We study the convergence behavior of PolyakGD from two perspectives: tight worst-case analysis and universality across function classes. As our first main result, we establish the tightness of the known convergence rates of PolyakGD by explicitly constructing worst-case functions. In particular, we show that the rate for smooth strongly convex functions and the rate for smooth convex functions are both tight. Moreover, we theoretically show that PolyakGD automatically exploits floating-point errors to escape the worst-case behavior. Our second main result provides new convergence guarantees for PolyakGD under both H\"older smoothness and H\"older growth conditions. These findings show that the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Complexity and Algorithms in Graphs
