TL;DR
PyLO introduces a user-friendly PyTorch library for learned optimizers, enabling large-scale pre-training and performance improvements in real-world machine learning tasks.
Contribution
The paper presents PyLO, a PyTorch-based library with CUDA-accelerated learned optimizers, making advanced optimization techniques accessible and practical for large-scale applications.
Findings
CUDA implementations significantly increase training throughput
Learned optimizers benefit from combining with existing tools
PyLO enables large-scale pre-training with learned optimizers
Abstract
Learned optimizers have been an active research topic over the past decade, with increasing progress toward practical, general-purpose optimizers that can serve as drop-in replacements for widely used methods like Adam. However, recent advances such as VeLO, which was meta-trained for 4000 TPU-months, remain largely inaccessible to the broader community, in part due to their reliance on JAX and the absence of user-friendly packages for independently using the optimizers after meta-training. To address this gap, we introduce PyLO, a PyTorch-based library that brings learned optimizers to the remaining ~70% of machine learning community via the familiar torch.optim.Optimizer interface. Unlike prior work focused on limited-scale academic tasks, our emphasis is on applying learned optimization to real-world large-scale pre-training tasks. Our systems contribution includes CUDA-accelerated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
