Forward and Reverse Gradient-Based Hyperparameter Optimization

Luca Franceschi; Michele Donini; Paolo Frasconi; Massimiliano Pontil

arXiv:1703.01785·stat.ML·December 13, 2017·ICML·56 cites

Forward and Reverse Gradient-Based Hyperparameter Optimization

Luca Franceschi, Michele Donini, Paolo Frasconi, Massimiliano Pontil

PDF

Open Access 3 Repos

TL;DR

This paper compares forward and reverse gradient-based methods for hyperparameter optimization, highlighting their trade-offs and proposing efficient procedures for iterative learning algorithms, with experiments demonstrating their effectiveness on various tasks.

Contribution

It introduces and analyzes two gradient computation procedures for hyperparameters, offering practical solutions for large-scale and real-time hyperparameter optimization.

Findings

01

Forward-mode enables real-time hyperparameter updates.

02

Reverse-mode is linked to previous work but more memory-efficient.

03

Both methods outperform traditional approaches in large-scale experiments.

Abstract

We study two procedures (reverse-mode and forward-mode) for computing the gradient of the validation error with respect to the hyperparameters of any iterative learning algorithm such as stochastic gradient descent. These procedures mirror two methods of computing gradients for recurrent neural networks and have different trade-offs in terms of running time and space requirements. Our formulation of the reverse-mode procedure is linked to previous work by Maclaurin et al. [2015] but does not require reversible dynamics. The forward-mode procedure is suitable for real-time hyperparameter updates, which may significantly speed up hyperparameter optimization on large datasets. We present experiments on data cleaning and on learning task interactions. We also present one large-scale experiment where the use of previous gradient-based methods would be prohibitive.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and Data Classification · Gaussian Processes and Bayesian Inference

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings