Practical tradeoffs between memory, compute, and performance in learned   optimizers

Luke Metz; C. Daniel Freeman; James Harrison; Niru Maheswaranathan,; Jascha Sohl-Dickstein

arXiv:2203.11860·cs.LG·July 19, 2022·1 cites

Practical tradeoffs between memory, compute, and performance in learned optimizers

Luke Metz, C. Daniel Freeman, James Harrison, Niru Maheswaranathan,, Jascha Sohl-Dickstein

PDF

Open Access 1 Repo

TL;DR

This paper analyzes the trade-offs between memory, compute, and performance in learned optimizers, and introduces a new optimizer that is faster and more memory-efficient.

Contribution

It identifies key design features affecting resource trade-offs and develops a learned optimizer that improves on speed and memory efficiency.

Findings

01

Quantifies memory, compute, and performance trade-offs in optimizers.

02

Develops a learned optimizer that is faster and more memory-efficient.

03

Provides open-source code for the proposed optimizer.

Abstract

Optimization plays a costly and crucial role in developing machine learning systems. In learned optimizers, the few hyperparameters of commonly used hand-designed optimizers, e.g. Adam or SGD, are replaced with flexible parametric functions. The parameters of these functions are then optimized so that the resulting learned optimizer minimizes a target loss on a chosen class of models. Learned optimizers can both reduce the number of required training steps and improve the final test loss. However, they can be expensive to train, and once trained can be expensive to use due to computational and memory overhead for the optimizer itself. In this work, we identify and quantify the design features governing the memory, compute, and performance trade-offs for many learned and hand-designed optimizers. We further leverage our analysis to construct a learned optimizer that is both faster and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

google/learned_optimization
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Machine Learning and Algorithms · Metaheuristic Optimization Algorithms Research

MethodsAdam · Stochastic Gradient Descent