Scalable One-Pass Optimisation of High-Dimensional Weight-Update   Hyperparameters by Implicit Differentiation

Ross M. Clarke; Elre T. Oldewage; Jos\'e Miguel Hern\'andez-Lobato

arXiv:2110.10461·cs.LG·April 22, 2022

Scalable One-Pass Optimisation of High-Dimensional Weight-Update Hyperparameters by Implicit Differentiation

Ross M. Clarke, Elre T. Oldewage, Jos\'e Miguel Hern\'andez-Lobato

PDF

1 Repo 1 Video

TL;DR

This paper introduces a novel one-pass hyperparameter optimization method using implicit differentiation, enabling efficient tuning of high-dimensional hyperparameters during training without restarts, applicable to various models and hyperparameters.

Contribution

It extends existing hypergradient methods to handle arbitrary differentiable hyperparameters in a single training episode, improving efficiency and applicability.

Findings

01

Performs competitively across multiple datasets and models.

02

Requires only 2-3 times the training time of vanilla training.

03

Applicable to any continuous hyperparameter in differentiable models.

Abstract

Machine learning training methods depend plentifully and intricately on hyperparameters, motivating automated strategies for their optimisation. Many existing algorithms restart training for each new hyperparameter choice, at considerable computational cost. Some hypergradient-based one-pass methods exist, but these either cannot be applied to arbitrary optimiser hyperparameters (such as learning rates and momenta) or take several times longer to train than their base models. We extend these existing methods to develop an approximate hypergradient-based hyperparameter optimiser which is applicable to any continuous hyperparameter appearing in a differentiable model weight update, yet requires only one training episode, with no restarts. We also provide a motivating argument for convergence to the true hypergradient, and perform tractable gradient-based optimisation of independent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rmclarke/optimisingweightupdatehyperparameters
pytorchOfficial

Videos

Scalable One-Pass Optimisation of High-Dimensional Weight-Update Hyperparameters by Implicit Differentiation· slideslive