Gradient-based Hyperparameter Optimization Over Long Horizons

Paul Micaelli; Amos Storkey

arXiv:2007.07869·cs.LG·October 1, 2021

Gradient-based Hyperparameter Optimization Over Long Horizons

Paul Micaelli, Amos Storkey

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces FDS, a novel gradient-based hyperparameter optimization method that efficiently handles long-horizon tasks by sharing hyperparameters and using forward-mode differentiation, outperforming existing methods in speed and accuracy.

Contribution

The paper proposes FDS, a new algorithm combining forward-mode differentiation and hyperparameter sharing to improve long-horizon hyperparameter optimization.

Findings

01

FDS reduces memory usage and gradient degradation.

02

FDS outperforms greedy and black-box methods on CIFAR-10.

03

FDS achieves 20x speedup over state-of-the-art black-box methods.

Abstract

Gradient-based hyperparameter optimization has earned a widespread popularity in the context of few-shot meta-learning, but remains broadly impractical for tasks with long horizons (many gradient steps), due to memory scaling and gradient degradation issues. A common workaround is to learn hyperparameters online, but this introduces greediness which comes with a significant performance drop. We propose forward-mode differentiation with sharing (FDS), a simple and efficient algorithm which tackles memory scaling issues with forward-mode differentiation, and gradient degradation issues by sharing hyperparameters that are contiguous in time. We provide theoretical guarantees about the noise reduction properties of our algorithm, and demonstrate its efficiency empirically by differentiating through $\sim 1 0^{4}$ gradient steps of unrolled optimization. We consider large hyperparameter search…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

polo5/fds
pytorchOfficial

Videos

Gradient-based Hyperparameter Optimization Over Long Horizons· slideslive

Taxonomy

TopicsMachine Learning and Data Classification · Advanced Neural Network Applications · Model Reduction and Neural Networks

MethodsWeight Decay · Stochastic Gradient Descent