Function Value Learning: Adaptive Learning Rates Based on the Polyak   Stepsize and Function Splitting in ERM

Guillaume Garrigos; Robert M. Gower; Fabian Schaipp

arXiv:2307.14528·cs.LG·July 28, 2023·1 cites

Function Value Learning: Adaptive Learning Rates Based on the Polyak Stepsize and Function Splitting in ERM

Guillaume Garrigos, Robert M. Gower, Fabian Schaipp

PDF

Open Access

TL;DR

This paper introduces adaptive SGD variants using Polyak stepsize and function splitting, with a focus on empirical risk minimization, but finds limited practical advantages over standard SGD.

Contribution

Develops the $ exttt{SPS}_+$ and $ exttt{FUVAL}$ methods, extending Polyak stepsize adaptive techniques to ERM with new analysis approaches.

Findings

01

$ exttt{SPS}_+$ achieves best known convergence rates for non-smooth Lipschitz problems.

02

$ exttt{FUVAL}$ can be viewed as a projection, prox-linear, and online SGD method.

03

Full batch $ exttt{FUVAL}$ shows minor advantages over GD, stochastic version does not outperform SGD.

Abstract

Here we develop variants of SGD (stochastic gradient descent) with an adaptive step size that make use of the sampled loss values. In particular, we focus on solving a finite sum-of-terms problem, also known as empirical risk minimization. We first detail an idealized adaptive method called $SPS_{+}$ that makes use of the sampled loss values and assumes knowledge of the sampled loss at optimality. This $SPS_{+}$ is a minor modification of the SPS (Stochastic Polyak Stepsize) method, where the step size is enforced to be positive. We then show that $SPS_{+}$ achieves the best known rates of convergence for SGD in the Lipschitz non-smooth. We then move onto to develop $FUVAL$ , a variant of $SPS_{+}$ where the loss values at optimality are gradually learned, as opposed to being given. We give three viewpoints of $FUVAL$ , as a projection based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and Algorithms · Domain Adaptation and Few-Shot Learning

MethodsFocus · Semi-Pseudo-Label · Stochastic Gradient Descent