Learning Gradient Descent: Better Generalization and Longer Horizons

Kaifeng Lv; Shunhua Jiang; Jian Li

arXiv:1703.03633·cs.LG·June 13, 2017·39 cites

Learning Gradient Descent: Better Generalization and Longer Horizons

Kaifeng Lv, Shunhua Jiang, Jian Li

PDF

Open Access 2 Repos

TL;DR

This paper introduces a learned optimizer that improves training efficiency and generalization of neural networks, outperforming traditional and existing learning-to-learn optimizers across various architectures.

Contribution

A novel learning-to-learn model with practical tricks that surpasses existing optimizers in multiple neural network training tasks.

Findings

01

Outperforms hand-crafted optimizers on multiple tasks

02

Effective on deep MLPs, CNNs, and LSTMs

03

Enhances generalization and training stability

Abstract

Training deep neural networks is a highly nontrivial task, involving carefully selecting appropriate training algorithms, scheduling step sizes and tuning other hyperparameters. Trying different combinations can be quite labor-intensive and time consuming. Recently, researchers have tried to use deep learning algorithms to exploit the landscape of the loss function of the training problem of interest, and learn how to optimize over it in an automatic way. In this paper, we propose a new learning-to-learn model and some useful and practical tricks. Our optimizer outperforms generic, hand-crafted optimization algorithms and state-of-the-art learning-to-learn optimizers by DeepMind in many tasks. We demonstrate the effectiveness of our algorithms on a number of tasks, including deep MLPs, CNNs, and simple LSTMs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Machine Learning and Data Classification · Neural Networks and Applications