Learned Optimizers that Scale and Generalize

Olga Wichrowska; Niru Maheswaranathan; Matthew W. Hoffman; Sergio; Gomez Colmenarejo; Misha Denil; Nando de Freitas; Jascha Sohl-Dickstein

arXiv:1703.04813·cs.LG·September 11, 2017·115 cites

Learned Optimizers that Scale and Generalize

Olga Wichrowska, Niru Maheswaranathan, Matthew W. Hoffman, Sergio, Gomez Colmenarejo, Misha Denil, Nando de Freitas, Jascha Sohl-Dickstein

PDF

Open Access 1 Repo

TL;DR

This paper presents a scalable, generalizable learned optimizer based on a hierarchical RNN architecture that outperforms traditional optimizers and generalizes across diverse tasks, including large neural networks on ImageNet.

Contribution

Introduces a novel hierarchical RNN-based learned optimizer that scales, reduces overhead, and generalizes well to unseen tasks and large-scale neural networks.

Findings

01

Outperforms RMSProp/ADAM on diverse tasks

02

Generalizes to unseen neural network architectures

03

Successfully trains large models on ImageNet

Abstract

Learning to learn has emerged as an important direction for achieving artificial intelligence. Two of the primary barriers to its adoption are an inability to scale to larger problems and a limited ability to generalize to new tasks. We introduce a learned gradient descent optimizer that generalizes well to new tasks, and which has significantly reduced memory and computation overhead. We achieve this by introducing a novel hierarchical RNN architecture, with minimal per-parameter overhead, augmented with additional architectural features that mirror the known structure of optimization tasks. We also develop a meta-training ensemble of small, diverse optimization tasks capturing common properties of loss landscapes. The optimizer learns to outperform RMSProp/ADAM on problems in this corpus. More importantly, it performs comparably or better when applied to small convolutional neural…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Kolin96/learning-to-learn
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications

MethodsAverage Pooling · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Batch Normalization · Bottleneck Residual Block · Global Average Pooling · Residual Block · Kaiming Initialization · Max Pooling · Residual Connection