A Stochastic Proximal Polyak Step Size
Fabian Schaipp, Robert M. Gower, Michael Ulbrich

TL;DR
ProxSPS is a new proximal variant of the stochastic Polyak step size method, designed to better handle regularization in stochastic gradient descent, offering improved stability and tuning ease.
Contribution
We introduce ProxSPS, a proximal version of SPS that effectively manages regularization terms and provides comprehensive convergence analysis across various convexity settings.
Findings
ProxSPS performs comparably to AdamW on image classification tasks.
ProxSPS is easier to tune and more stable with regularization.
ProxSPS yields smaller weight parameters in neural networks.
Abstract
Recently, the stochastic Polyak step size (SPS) has emerged as a competitive adaptive step size scheme for stochastic gradient descent. Here we develop ProxSPS, a proximal variant of SPS that can handle regularization terms. Developing a proximal variant of SPS is particularly important, since SPS requires a lower bound of the objective function to work well. When the objective function is the sum of a loss and a regularizer, available estimates of a lower bound of the sum can be loose. In contrast, ProxSPS only requires a lower bound for the loss which is often readily available. As a consequence, we show that ProxSPS is easier to tune and more stable in the presence of regularization. Furthermore for image classification tasks, ProxSPS performs as well as AdamW with little to no tuning, and results in a network with smaller weight parameters. We also provide an extensive convergence…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Sparse and Compressive Sensing Techniques
MethodsAdamW · Semi-Pseudo-Label
