Learning to learn by gradient descent by gradient descent
Marcin Andrychowicz, Misha Denil, Sergio Gomez, Matthew W., Hoffman, David Pfau, Tom Schaul, Brendan Shillingford, Nando de, Freitas

TL;DR
This paper introduces a method to learn optimization algorithms using neural networks, enabling automatic adaptation to problem structure and outperforming traditional algorithms on various tasks.
Contribution
It presents a novel approach to automatically learn optimization algorithms with LSTMs, replacing hand-designed methods and improving performance across multiple problem types.
Findings
Learned algorithms outperform hand-designed ones on trained tasks.
Algorithms generalize well to new, similar tasks.
Effective on convex problems, neural network training, and neural art styling.
Abstract
The move from hand-designed features to learned features in machine learning has been wildly successful. In spite of this, optimization algorithms are still designed by hand. In this paper we show how the design of an optimization algorithm can be cast as a learning problem, allowing the algorithm to learn to exploit structure in the problems of interest in an automatic way. Our learned algorithms, implemented by LSTMs, outperform generic, hand-designed competitors on the tasks for which they are trained, and also generalize well to new tasks with similar structure. We demonstrate this on a number of tasks, including simple convex problems, training neural networks, and styling images with neural art.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
What is Optimization? + Learning Gradient Descent | Two Minute Papers #82· youtube
Taxonomy
TopicsHuman Pose and Action Recognition · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
