PyLO: Towards Accessible Learned Optimizers in PyTorch

Paul Janson; Benjamin Therien; Quentin Anthony; Xiaolong Huang; Abhinav Moudgil; Eugene Belilovsky

arXiv:2506.10315·cs.LG·April 20, 2026

PyLO: Towards Accessible Learned Optimizers in PyTorch

Paul Janson, Benjamin Therien, Quentin Anthony, Xiaolong Huang, Abhinav Moudgil, Eugene Belilovsky

PDF

1 Repo

TL;DR

PyLO introduces a user-friendly PyTorch library for learned optimizers, enabling large-scale pre-training and performance improvements in real-world machine learning tasks.

Contribution

The paper presents PyLO, a PyTorch-based library with CUDA-accelerated learned optimizers, making advanced optimization techniques accessible and practical for large-scale applications.

Findings

01

CUDA implementations significantly increase training throughput

02

Learned optimizers benefit from combining with existing tools

03

PyLO enables large-scale pre-training with learned optimizers

Abstract

Learned optimizers have been an active research topic over the past decade, with increasing progress toward practical, general-purpose optimizers that can serve as drop-in replacements for widely used methods like Adam. However, recent advances such as VeLO, which was meta-trained for 4000 TPU-months, remain largely inaccessible to the broader community, in part due to their reliance on JAX and the absence of user-friendly packages for independently using the optimizers after meta-training. To address this gap, we introduce PyLO, a PyTorch-based library that brings learned optimizers to the remaining ~70% of machine learning community via the familiar torch.optim.Optimizer interface. Unlike prior work focused on limited-scale academic tasks, our emphasis is on applying learned optimization to real-world large-scale pre-training tasks. Our systems contribution includes CUDA-accelerated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Belilovsky-Lab/pylo
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.