Forward and Reverse Gradient-Based Hyperparameter Optimization
Luca Franceschi, Michele Donini, Paolo Frasconi, Massimiliano Pontil

TL;DR
This paper compares forward and reverse gradient-based methods for hyperparameter optimization, highlighting their trade-offs and proposing efficient procedures for iterative learning algorithms, with experiments demonstrating their effectiveness on various tasks.
Contribution
It introduces and analyzes two gradient computation procedures for hyperparameters, offering practical solutions for large-scale and real-time hyperparameter optimization.
Findings
Forward-mode enables real-time hyperparameter updates.
Reverse-mode is linked to previous work but more memory-efficient.
Both methods outperform traditional approaches in large-scale experiments.
Abstract
We study two procedures (reverse-mode and forward-mode) for computing the gradient of the validation error with respect to the hyperparameters of any iterative learning algorithm such as stochastic gradient descent. These procedures mirror two methods of computing gradients for recurrent neural networks and have different trade-offs in terms of running time and space requirements. Our formulation of the reverse-mode procedure is linked to previous work by Maclaurin et al. [2015] but does not require reversible dynamics. The forward-mode procedure is suitable for real-time hyperparameter updates, which may significantly speed up hyperparameter optimization on large datasets. We present experiments on data cleaning and on learning task interactions. We also present one large-scale experiment where the use of previous gradient-based methods would be prohibitive.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and Data Classification · Gaussian Processes and Bayesian Inference
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
