Neural Optimizer Search with Reinforcement Learning
Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le

TL;DR
This paper introduces a reinforcement learning-based method to automatically discover and optimize neural network update rules, resulting in new optimizers that outperform traditional methods across various tasks.
Contribution
It presents a novel neural optimizer search framework using reinforcement learning, leading to the discovery of effective new optimization algorithms for deep learning.
Findings
Discovered update rules outperform Adam, RMSProp, SGD on CIFAR-10
Introduced two new optimizers: PowerSign and AddSign
New optimizers transfer well to other tasks like ImageNet and machine translation
Abstract
We present an approach to automate the process of discovering optimization methods, with a focus on deep learning architectures. We train a Recurrent Neural Network controller to generate a string in a domain specific language that describes a mathematical update equation based on a list of primitive functions, such as the gradient, running average of the gradient, etc. The controller is trained with Reinforcement Learning to maximize the performance of a model after a few epochs. On CIFAR-10, our method discovers several update rules that are better than many commonly used optimizers, such as Adam, RMSProp, or SGD with and without Momentum on a ConvNet model. We introduce two new optimizers, named PowerSign and AddSign, which we show transfer well and improve training on a variety of different tasks and architectures, including ImageNet classification and Google's neural machine…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Machine Learning and Algorithms · Adversarial Robustness in Machine Learning
MethodsAdam · Stochastic Gradient Descent
