Learning to learn by gradient descent by gradient descent

Marcin Andrychowicz; Misha Denil; Sergio Gomez; Matthew W.; Hoffman; David Pfau; Tom Schaul; Brendan Shillingford; Nando de; Freitas

arXiv:1606.04474·cs.NE·December 1, 2016·344 cites

Learning to learn by gradient descent by gradient descent

Marcin Andrychowicz, Misha Denil, Sergio Gomez, Matthew W., Hoffman, David Pfau, Tom Schaul, Brendan Shillingford, Nando de, Freitas

PDF

Open Access 5 Repos 1 Video

TL;DR

This paper introduces a method to learn optimization algorithms using neural networks, enabling automatic adaptation to problem structure and outperforming traditional algorithms on various tasks.

Contribution

It presents a novel approach to automatically learn optimization algorithms with LSTMs, replacing hand-designed methods and improving performance across multiple problem types.

Findings

01

Learned algorithms outperform hand-designed ones on trained tasks.

02

Algorithms generalize well to new, similar tasks.

03

Effective on convex problems, neural network training, and neural art styling.

Abstract

The move from hand-designed features to learned features in machine learning has been wildly successful. In spite of this, optimization algorithms are still designed by hand. In this paper we show how the design of an optimization algorithm can be cast as a learning problem, allowing the algorithm to learn to exploit structure in the problems of interest in an automatic way. Our learned algorithms, implemented by LSTMs, outperform generic, hand-designed competitors on the tasks for which they are trained, and also generalize well to new tasks with similar structure. We demonstrate this on a number of tasks, including simple convex problems, training neural networks, and styling images with neural art.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

What is Optimization? + Learning Gradient Descent | Two Minute Papers #82· youtube

Taxonomy

TopicsHuman Pose and Action Recognition · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning