Celo: Training Versatile Learned Optimizers on a Compute Diet
Abhinav Moudgil, Boris Knyazev, Guillaume Lajoie, Eugene Belilovsky

TL;DR
Celo introduces a new learned optimizer that achieves strong meta-generalization and outperforms existing optimizers on diverse tasks with significantly less computational resources, enabling practical off-the-shelf use.
Contribution
The paper presents Celo, a learned optimizer architecture and training procedure that significantly improves meta-generalization with minimal meta-training time.
Findings
Celo outperforms state-of-the-art optimizers on diverse tasks.
Celo requires only 24 GPU hours for meta-training.
Celo demonstrates strong generalization to out-of-distribution tasks.
Abstract
Learned optimization has emerged as a promising alternative to hand-crafted optimizers, with the potential to discover stronger learned update rules that enable faster, hyperparameter-free training of neural networks. A critical element for practically useful learned optimizers, that can be used off-the-shelf after meta-training, is strong meta-generalization: the ability to apply the optimizers to new tasks. Recent state-of-the-art work in learned optimizers, VeLO (Metz et al., 2022), requires a large number of highly diverse meta-training tasks along with massive computational resources, 4000 TPU months, to achieve meta-generalization. This makes further improvements to such learned optimizers impractical. In this work, we identify several key elements in learned optimizer architectures and meta-training procedures that can lead to strong meta-generalization. We also propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFuzzy Logic and Control Systems
MethodsSparse Evolutionary Training
