Practical tradeoffs between memory, compute, and performance in learned optimizers
Luke Metz, C. Daniel Freeman, James Harrison, Niru Maheswaranathan,, Jascha Sohl-Dickstein

TL;DR
This paper analyzes the trade-offs between memory, compute, and performance in learned optimizers, and introduces a new optimizer that is faster and more memory-efficient.
Contribution
It identifies key design features affecting resource trade-offs and develops a learned optimizer that improves on speed and memory efficiency.
Findings
Quantifies memory, compute, and performance trade-offs in optimizers.
Develops a learned optimizer that is faster and more memory-efficient.
Provides open-source code for the proposed optimizer.
Abstract
Optimization plays a costly and crucial role in developing machine learning systems. In learned optimizers, the few hyperparameters of commonly used hand-designed optimizers, e.g. Adam or SGD, are replaced with flexible parametric functions. The parameters of these functions are then optimized so that the resulting learned optimizer minimizes a target loss on a chosen class of models. Learned optimizers can both reduce the number of required training steps and improve the final test loss. However, they can be expensive to train, and once trained can be expensive to use due to computational and memory overhead for the optimizer itself. In this work, we identify and quantify the design features governing the memory, compute, and performance trade-offs for many learned and hand-designed optimizers. We further leverage our analysis to construct a learned optimizer that is both faster and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Machine Learning and Algorithms · Metaheuristic Optimization Algorithms Research
MethodsAdam · Stochastic Gradient Descent
