Gradient-based Hyperparameter Optimization through Reversible Learning
Dougal Maclaurin, David Duvenaud, Ryan P. Adams

TL;DR
This paper introduces a method to compute exact gradients of cross-validation performance with respect to hyperparameters by reversing the training process, enabling efficient optimization of complex hyperparameter spaces.
Contribution
It presents a novel approach to hyperparameter optimization by exactly reversing stochastic gradient descent dynamics to compute gradients, allowing for large-scale hyperparameter tuning.
Findings
Enables optimization of thousands of hyperparameters.
Allows gradient-based tuning of complex hyperparameters like architectures and regularization.
Demonstrates effectiveness on various neural network configurations.
Abstract
Tuning hyperparameters of learning algorithms is hard because gradients are usually unavailable. We compute exact gradients of cross-validation performance with respect to all hyperparameters by chaining derivatives backwards through the entire training procedure. These gradients allow us to optimize thousands of hyperparameters, including step-size and momentum schedules, weight initialization distributions, richly parameterized regularization schemes, and neural network architectures. We compute hyperparameter gradients by exactly reversing the dynamics of stochastic gradient descent with momentum.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Gaussian Processes and Bayesian Inference · Advanced Neural Network Applications
