Learning to Learn without Gradient Descent by Gradient Descent
Yutian Chen, Matthew W. Hoffman, Sergio Gomez Colmenarejo, Misha, Denil, Timothy P. Lillicrap, Matt Botvinick, Nando de Freitas

TL;DR
This paper introduces learned recurrent neural network optimizers trained via gradient descent that can efficiently optimize a wide range of black-box functions, demonstrating strong transferability and competitive performance against traditional Bayesian methods.
Contribution
It presents a novel approach of training neural network optimizers on synthetic functions that generalize well to diverse black-box optimization tasks.
Findings
Learned optimizers transfer effectively across tasks.
Optimizers outperform traditional methods in hyper-parameter tuning.
They balance exploration and exploitation during optimization.
Abstract
We learn recurrent neural network optimizers trained on simple synthetic functions by gradient descent. We show that these learned optimizers exhibit a remarkable degree of transfer in that they can be used to efficiently optimize a broad range of derivative-free black-box functions, including Gaussian process bandits, simple control objectives, global optimization benchmarks and hyper-parameter tuning tasks. Up to the training horizon, the learned optimizers learn to trade-off exploration and exploitation, and compare favourably with heavily engineered Bayesian optimization packages for hyper-parameter tuning.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHigher Education Learning Practices · Intelligent Tutoring Systems and Adaptive Learning
