Stochastic Polyak Stepsize with a Moving Target
Robert M. Gower, Aaron Defazio, Michael Rabbat

TL;DR
MOTAPS is a new stochastic gradient method that adaptively adjusts stepsizes using past loss values, extending the Stochastic Polyak method to settings without the interpolation condition, and demonstrating competitive performance.
Contribution
The paper introduces MOTAPS, a novel stochastic gradient method that extends SP by removing the interpolation condition requirement and using auxiliary variables for adaptive stepsizes.
Findings
MOTAPS converges globally under broad conditions.
MOTAPS performs competitively on convex and deep learning tasks.
Theoretical analysis links MOTAPS to online SGD variants.
Abstract
We propose a new stochastic gradient method called MOTAPS (Moving Targetted Polyak Stepsize) that uses recorded past loss values to compute adaptive stepsizes. MOTAPS can be seen as a variant of the Stochastic Polyak (SP) which is also a method that also uses loss values to adjust the stepsize. The downside to the SP method is that it only converges when the interpolation condition holds. MOTAPS is an extension of SP that does not rely on the interpolation condition. The MOTAPS method uses auxiliary variables, one for each data point, that track the loss value for each data point. We provide a global convergence theory for SP, an intermediary method TAPS, and MOTAPS by showing that they all can be interpreted as a special variant of online SGD. We also perform several numerical experiments on convex learning problems, and deep learning models for image classification and language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis
MethodsStochastic Gradient Descent
