Symbolic Discovery of Optimization Algorithms
Xiangning Chen, Chen Liang, Da Huang, Esteban Real, Kaiyuan Wang, Yao, Liu, Hieu Pham, Xuanyi Dong, Thang Luong, Cho-Jui Hsieh, Yifeng Lu, Quoc V., Le

TL;DR
This paper introduces a novel method for discovering optimization algorithms through program search, resulting in Lion, an efficient optimizer that outperforms existing methods across various deep learning tasks.
Contribution
The paper presents a new approach to algorithm discovery via program search, leading to the development of Lion, a memory-efficient optimizer with superior performance.
Findings
Lion improves image classification accuracy by up to 2% on ImageNet.
Lion reduces training compute by up to 5x on JFT.
Lion outperforms Adam in various vision and language tasks.
Abstract
We present a method to formulate algorithm discovery as program search, and apply it to discover optimization algorithms for deep neural network training. We leverage efficient search techniques to explore an infinite and sparse program space. To bridge the large generalization gap between proxy and target tasks, we also introduce program selection and simplification strategies. Our method discovers a simple and effective optimization algorithm, (\textit{Evo\textbf{L}\textbf{i}\textbf{o}\textbf{n}tum}). It is more memory-efficient than Adam as it only keeps track of the momentum. Different from adaptive optimizers, its update has the same magnitude for each parameter calculated through the sign operation. We compare Lion with widely used optimizers, such as Adam and Adafactor, for training a variety of models on different tasks. On image…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗replit/replit-code-v1-3bmodel· 837 dl· ♡ 742837 dl♡ 742
- 🤗TehVenom/MPT-7b-storywriter-Apache-2.0model· 119 dl· ♡ 5119 dl♡ 5
- 🤗TehVenom/MPT-7b-Apache-2.0model· ♡ 2♡ 2
- 🤗gl198976/mpt-7bmodel· 173 dl· ♡ 2173 dl♡ 2
- 🤗gl198976/MPT-7b-storywriter-Apache-2.0model· ♡ 2♡ 2
- 🤗lentan/replitmodel· 36 dl· ♡ 336 dl♡ 3
- 🤗P1ayer-1/mpt-7b-instruct-basemodel· 140 dl· ♡ 2140 dl♡ 2
- 🤗Green-Sky/ggml-mpt-7b-storywritermodel· ♡ 14♡ 14
- 🤗TheBloke/MPT-7B-GGMLmodel· 6 dl· ♡ 216 dl♡ 21
- 🤗TheBloke/MPT-7B-Storywriter-GGMLmodel· 23 dl· ♡ 5623 dl♡ 56
Videos
Taxonomy
TopicsMachine Learning and Data Classification · Advanced Neural Network Applications · Machine Learning and Algorithms
MethodsEvolved Sign Momentum · Adafactor · Diffusion · Adam
