Automatic Tuning of Stochastic Gradient Descent with Bayesian Optimisation
Victor Picheny, Vincent Dutordoir, Artem Artemev, Nicolas Durrande

TL;DR
This paper introduces a probabilistic model using Gaussian processes for dynamically tuning the learning rate schedule of stochastic gradient descent, improving efficiency in training machine learning models.
Contribution
It presents a novel Bayesian optimisation approach with a flexible probabilistic model for on-line and transfer learning of learning rate schedules.
Findings
Effective on-line adaptation of learning rates demonstrated
Improved tuning for multiple similar tasks shown
Model handles abrupt changes in optimizer behavior
Abstract
Many machine learning models require a training procedure based on running stochastic gradient descent. A key element for the efficiency of those algorithms is the choice of the learning rate schedule. While finding good learning rates schedules using Bayesian optimisation has been tackled by several authors, adapting it dynamically in a data-driven way is an open question. This is of high practical importance to users that need to train a single, expensive model. To tackle this problem, we introduce an original probabilistic model for traces of optimisers, based on latent Gaussian processes and an auto-/regressive formulation, that flexibly adjusts to abrupt changes of behaviours induced by new learning rate values. As illustrated, this model is well-suited to tackle a set of problems: first, for the on-line adaptation of the learning rate for a cold-started run; then, for tuning the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
