Celo: Training Versatile Learned Optimizers on a Compute Diet

Abhinav Moudgil; Boris Knyazev; Guillaume Lajoie; Eugene Belilovsky

arXiv:2501.12670·cs.LG·June 23, 2025

Celo: Training Versatile Learned Optimizers on a Compute Diet

Abhinav Moudgil, Boris Knyazev, Guillaume Lajoie, Eugene Belilovsky

PDF

Open Access 1 Repo

TL;DR

Celo introduces a new learned optimizer that achieves strong meta-generalization and outperforms existing optimizers on diverse tasks with significantly less computational resources, enabling practical off-the-shelf use.

Contribution

The paper presents Celo, a learned optimizer architecture and training procedure that significantly improves meta-generalization with minimal meta-training time.

Findings

01

Celo outperforms state-of-the-art optimizers on diverse tasks.

02

Celo requires only 24 GPU hours for meta-training.

03

Celo demonstrates strong generalization to out-of-distribution tasks.

Abstract

Learned optimization has emerged as a promising alternative to hand-crafted optimizers, with the potential to discover stronger learned update rules that enable faster, hyperparameter-free training of neural networks. A critical element for practically useful learned optimizers, that can be used off-the-shelf after meta-training, is strong meta-generalization: the ability to apply the optimizers to new tasks. Recent state-of-the-art work in learned optimizers, VeLO (Metz et al., 2022), requires a large number of highly diverse meta-training tasks along with massive computational resources, 4000 TPU months, to achieve meta-generalization. This makes further improvements to such learned optimizers impractical. In this work, we identify several key elements in learned optimizer architectures and meta-training procedures that can lead to strong meta-generalization. We also propose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

amoudgl/celo
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFuzzy Logic and Control Systems

MethodsSparse Evolutionary Training