Gradient-based Hyperparameter Optimization Over Long Horizons
Paul Micaelli, Amos Storkey

TL;DR
This paper introduces FDS, a novel gradient-based hyperparameter optimization method that efficiently handles long-horizon tasks by sharing hyperparameters and using forward-mode differentiation, outperforming existing methods in speed and accuracy.
Contribution
The paper proposes FDS, a new algorithm combining forward-mode differentiation and hyperparameter sharing to improve long-horizon hyperparameter optimization.
Findings
FDS reduces memory usage and gradient degradation.
FDS outperforms greedy and black-box methods on CIFAR-10.
FDS achieves 20x speedup over state-of-the-art black-box methods.
Abstract
Gradient-based hyperparameter optimization has earned a widespread popularity in the context of few-shot meta-learning, but remains broadly impractical for tasks with long horizons (many gradient steps), due to memory scaling and gradient degradation issues. A common workaround is to learn hyperparameters online, but this introduces greediness which comes with a significant performance drop. We propose forward-mode differentiation with sharing (FDS), a simple and efficient algorithm which tackles memory scaling issues with forward-mode differentiation, and gradient degradation issues by sharing hyperparameters that are contiguous in time. We provide theoretical guarantees about the noise reduction properties of our algorithm, and demonstrate its efficiency empirically by differentiating through gradient steps of unrolled optimization. We consider large hyperparameter search…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMachine Learning and Data Classification · Advanced Neural Network Applications · Model Reduction and Neural Networks
MethodsWeight Decay · Stochastic Gradient Descent
