A Probabilistically Motivated Learning Rate Adaptation for Stochastic   Optimization

Filip de Roos; Carl Jidling; Adrian Wills; Thomas Sch\"on; Philipp; Hennig

arXiv:2102.10880·cs.LG·February 23, 2021·1 cites

A Probabilistically Motivated Learning Rate Adaptation for Stochastic Optimization

Filip de Roos, Carl Jidling, Adrian Wills, Thomas Sch\"on, Philipp, Hennig

PDF

Open Access

TL;DR

This paper introduces a probabilistic framework for automatically adapting learning rates in stochastic optimization, improving robustness and reducing manual tuning in deep learning training.

Contribution

It provides a Gaussian inference-based motivation for learning rate adaptation, leading to a meta-algorithm that adjusts learning rates automatically during training.

Findings

01

Robust adaptation of learning rates across various initial values

02

Effective in deep learning benchmark tasks

03

Relates learning rate to a dimensionless, controllable quantity

Abstract

Machine learning practitioners invest significant manual and computational resources in finding suitable learning rates for optimization algorithms. We provide a probabilistic motivation, in terms of Gaussian inference, for popular stochastic first-order methods. As an important special case, it recovers the Polyak step with a general metric. The inference allows us to relate the learning rate to a dimensionless quantity that can be automatically adapted during training by a control algorithm. The resulting meta-algorithm is shown to adapt learning rates in a robust manner across a large range of initial values when applied to deep learning benchmark problems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGaussian Processes and Bayesian Inference · Machine Learning and Data Classification · Stochastic Gradient Optimization Techniques