The High Line: Exact Risk and Learning Rate Curves of Stochastic   Adaptive Learning Rate Algorithms

Elizabeth Collins-Woodfin; Inbar Seroussi; Bego\~na Garc\'ia; Malaxechebarr\'ia; Andrew W. Mackenzie; Elliot Paquette; Courtney Paquette

arXiv:2405.19585·math.OC·November 15, 2024·NeurIPS·1 cites

The High Line: Exact Risk and Learning Rate Curves of Stochastic Adaptive Learning Rate Algorithms

Elizabeth Collins-Woodfin, Inbar Seroussi, Bego\~na Garc\'ia, Malaxechebarr\'ia, Andrew W. Mackenzie, Elliot Paquette, Courtney Paquette

PDF

Open Access 1 Repo

TL;DR

This paper introduces a framework for analyzing the dynamics of stochastic gradient descent with adaptive learning rates, providing exact risk and learning rate curves, and exploring their behavior on high-dimensional least squares problems.

Contribution

It develops a deterministic ODE-based framework for exact risk and learning rate analysis of adaptive SGD algorithms in high dimensions, with detailed case studies.

Findings

01

Exact expressions for risk and learning rate curves via ODEs.

02

Idealized line search can be slower than fixed learning rate SGD.

03

AdaGrad-Norm converges to a constant inversely related to data eigenvalues.

Abstract

We develop a framework for analyzing the training and learning rate dynamics on a large class of high-dimensional optimization problems, which we call the high line, trained using one-pass stochastic gradient descent (SGD) with adaptive learning rates. We give exact expressions for the risk and learning rate curves in terms of a deterministic solution to a system of ODEs. We then investigate in detail two adaptive learning rates -- an idealized exact line search and AdaGrad-Norm -- on the least squares problem. When the data covariance matrix has strictly positive eigenvalues, this idealized exact line search strategy can exhibit arbitrarily slower convergence when compared to the optimal fixed learning rate with SGD. Moreover we exactly characterize the limiting learning rate (as time goes to infinity) for line search in the setting where the data covariance has only two distinct…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

amackenzie1/highline2024
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Neural Networks and Applications · Online Learning and Analytics

MethodsStochastic Gradient Descent