Safeguarded Stochastic Polyak Step Sizes for Non-smooth Optimization: Robust Performance Without Small (Sub)Gradients
Dimitris Oikonomou, Nicolas Loizou

TL;DR
This paper introduces a new safeguarded stochastic Polyak step size for non-smooth optimization that guarantees convergence without strong assumptions and performs robustly in deep learning tasks.
Contribution
It proposes SPS$_{safe}$, a novel step size method with theoretical guarantees for non-smooth convex optimization, and demonstrates its effectiveness in deep neural network training.
Findings
Achieves competitive performance on deep neural networks.
Ensures convergence without requiring interpolation assumptions.
Maintains stable gradient norms, indicating robustness to vanishing gradients.
Abstract
The stochastic Polyak step size (SPS) has proven to be a promising choice for stochastic gradient descent (SGD), delivering competitive performance relative to state-of-the-art methods on smooth convex and non-convex optimization problems, including deep neural network training. However, extensions of this approach to non-smooth settings remain in their early stages, often relying on interpolation assumptions or requiring knowledge of the optimal solution. In this work, we propose a novel SPS variant, Safeguarded SPS (SPS), for the stochastic subgradient method, and provide rigorous convergence guarantees for non-smooth convex optimization with no need for strong assumptions. We further incorporate momentum into the update rule, yielding equally tight theoretical results. On non-smooth convex benchmarks, our experiments are consistent with the theoretical predictions on how the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Privacy-Preserving Technologies in Data
