Gradient-based Hyperparameter Optimization through Reversible Learning

Dougal Maclaurin; David Duvenaud; Ryan P. Adams

arXiv:1502.03492·stat.ML·April 3, 2015·403 cites

Gradient-based Hyperparameter Optimization through Reversible Learning

Dougal Maclaurin, David Duvenaud, Ryan P. Adams

PDF

Open Access 2 Repos

TL;DR

This paper introduces a method to compute exact gradients of cross-validation performance with respect to hyperparameters by reversing the training process, enabling efficient optimization of complex hyperparameter spaces.

Contribution

It presents a novel approach to hyperparameter optimization by exactly reversing stochastic gradient descent dynamics to compute gradients, allowing for large-scale hyperparameter tuning.

Findings

01

Enables optimization of thousands of hyperparameters.

02

Allows gradient-based tuning of complex hyperparameters like architectures and regularization.

03

Demonstrates effectiveness on various neural network configurations.

Abstract

Tuning hyperparameters of learning algorithms is hard because gradients are usually unavailable. We compute exact gradients of cross-validation performance with respect to all hyperparameters by chaining derivatives backwards through the entire training procedure. These gradients allow us to optimize thousands of hyperparameters, including step-size and momentum schedules, weight initialization distributions, richly parameterized regularization schemes, and neural network architectures. We compute hyperparameter gradients by exactly reversing the dynamics of stochastic gradient descent with momentum.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Gaussian Processes and Bayesian Inference · Advanced Neural Network Applications