Checkmate: Breaking the Memory Wall with Optimal Tensor   Rematerialization

Paras Jain; Ajay Jain; Aniruddha Nrusimha; Amir Gholami; Pieter; Abbeel; Kurt Keutzer; Ion Stoica; Joseph E. Gonzalez

arXiv:1910.02653·cs.LG·May 15, 2020·37 cites

Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization

Paras Jain, Ajay Jain, Aniruddha Nrusimha, Amir Gholami, Pieter, Abbeel, Kurt Keutzer, Ion Stoica, Joseph E. Gonzalez

PDF

Open Access 2 Repos

TL;DR

Checkmate introduces an optimization system for tensor rematerialization that reduces DNN training time and memory usage, enabling larger models and faster training through optimal scheduling.

Contribution

We formalize tensor rematerialization as an optimization problem and develop Checkmate, a system that finds near-optimal schedules efficiently for complex architectures.

Findings

01

Reduces training cost significantly

02

Enables training with up to 5.1x larger inputs

03

Scales to realistic, hardware-aware architectures

Abstract

We formalize the problem of trading-off DNN training time and memory requirements as the tensor rematerialization optimization problem, a generalization of prior checkpointing strategies. We introduce Checkmate, a system that solves for optimal rematerialization schedules in reasonable times (under an hour) using off-the-shelf MILP solvers or near-optimal schedules with an approximation algorithm, then uses these schedules to accelerate millions of training iterations. Our method scales to complex, realistic architectures and is hardware-aware through the use of accelerator-specific, profile-based cost models. In addition to reducing training cost, Checkmate enables real-world networks to be trained with up to 5.1x larger input sizes. Checkmate is an open-source project, available at https://github.com/parasj/checkmate.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Stochastic Gradient Optimization Techniques · Tensor decomposition and applications