Automatic Tuning of Stochastic Gradient Descent with Bayesian   Optimisation

Victor Picheny; Vincent Dutordoir; Artem Artemev; Nicolas Durrande

arXiv:2006.14376·stat.ML·June 26, 2020

Automatic Tuning of Stochastic Gradient Descent with Bayesian Optimisation

Victor Picheny, Vincent Dutordoir, Artem Artemev, Nicolas Durrande

PDF

TL;DR

This paper introduces a probabilistic model using Gaussian processes for dynamically tuning the learning rate schedule of stochastic gradient descent, improving efficiency in training machine learning models.

Contribution

It presents a novel Bayesian optimisation approach with a flexible probabilistic model for on-line and transfer learning of learning rate schedules.

Findings

01

Effective on-line adaptation of learning rates demonstrated

02

Improved tuning for multiple similar tasks shown

03

Model handles abrupt changes in optimizer behavior

Abstract

Many machine learning models require a training procedure based on running stochastic gradient descent. A key element for the efficiency of those algorithms is the choice of the learning rate schedule. While finding good learning rates schedules using Bayesian optimisation has been tackled by several authors, adapting it dynamically in a data-driven way is an open question. This is of high practical importance to users that need to train a single, expensive model. To tackle this problem, we introduce an original probabilistic model for traces of optimisers, based on latent Gaussian processes and an auto-/regressive formulation, that flexibly adjusts to abrupt changes of behaviours induced by new learning rate values. As illustrated, this model is well-suited to tackle a set of problems: first, for the on-line adaptation of the learning rate for a cold-started run; then, for tuning the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.