Learning Gradient Descent: Better Generalization and Longer Horizons
Kaifeng Lv, Shunhua Jiang, Jian Li

TL;DR
This paper introduces a learned optimizer that improves training efficiency and generalization of neural networks, outperforming traditional and existing learning-to-learn optimizers across various architectures.
Contribution
A novel learning-to-learn model with practical tricks that surpasses existing optimizers in multiple neural network training tasks.
Findings
Outperforms hand-crafted optimizers on multiple tasks
Effective on deep MLPs, CNNs, and LSTMs
Enhances generalization and training stability
Abstract
Training deep neural networks is a highly nontrivial task, involving carefully selecting appropriate training algorithms, scheduling step sizes and tuning other hyperparameters. Trying different combinations can be quite labor-intensive and time consuming. Recently, researchers have tried to use deep learning algorithms to exploit the landscape of the loss function of the training problem of interest, and learn how to optimize over it in an automatic way. In this paper, we propose a new learning-to-learn model and some useful and practical tricks. Our optimizer outperforms generic, hand-crafted optimization algorithms and state-of-the-art learning-to-learn optimizers by DeepMind in many tasks. We demonstrate the effectiveness of our algorithms on a number of tasks, including deep MLPs, CNNs, and simple LSTMs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Machine Learning and Data Classification · Neural Networks and Applications
