The High Line: Exact Risk and Learning Rate Curves of Stochastic Adaptive Learning Rate Algorithms
Elizabeth Collins-Woodfin, Inbar Seroussi, Bego\~na Garc\'ia, Malaxechebarr\'ia, Andrew W. Mackenzie, Elliot Paquette, Courtney Paquette

TL;DR
This paper introduces a framework for analyzing the dynamics of stochastic gradient descent with adaptive learning rates, providing exact risk and learning rate curves, and exploring their behavior on high-dimensional least squares problems.
Contribution
It develops a deterministic ODE-based framework for exact risk and learning rate analysis of adaptive SGD algorithms in high dimensions, with detailed case studies.
Findings
Exact expressions for risk and learning rate curves via ODEs.
Idealized line search can be slower than fixed learning rate SGD.
AdaGrad-Norm converges to a constant inversely related to data eigenvalues.
Abstract
We develop a framework for analyzing the training and learning rate dynamics on a large class of high-dimensional optimization problems, which we call the high line, trained using one-pass stochastic gradient descent (SGD) with adaptive learning rates. We give exact expressions for the risk and learning rate curves in terms of a deterministic solution to a system of ODEs. We then investigate in detail two adaptive learning rates -- an idealized exact line search and AdaGrad-Norm -- on the least squares problem. When the data covariance matrix has strictly positive eigenvalues, this idealized exact line search strategy can exhibit arbitrarily slower convergence when compared to the optimal fixed learning rate with SGD. Moreover we exactly characterize the limiting learning rate (as time goes to infinity) for line search in the setting where the data covariance has only two distinct…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Neural Networks and Applications · Online Learning and Analytics
MethodsStochastic Gradient Descent
