Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization
Paras Jain, Ajay Jain, Aniruddha Nrusimha, Amir Gholami, Pieter, Abbeel, Kurt Keutzer, Ion Stoica, Joseph E. Gonzalez

TL;DR
Checkmate introduces an optimization system for tensor rematerialization that reduces DNN training time and memory usage, enabling larger models and faster training through optimal scheduling.
Contribution
We formalize tensor rematerialization as an optimization problem and develop Checkmate, a system that finds near-optimal schedules efficiently for complex architectures.
Findings
Reduces training cost significantly
Enables training with up to 5.1x larger inputs
Scales to realistic, hardware-aware architectures
Abstract
We formalize the problem of trading-off DNN training time and memory requirements as the tensor rematerialization optimization problem, a generalization of prior checkpointing strategies. We introduce Checkmate, a system that solves for optimal rematerialization schedules in reasonable times (under an hour) using off-the-shelf MILP solvers or near-optimal schedules with an approximation algorithm, then uses these schedules to accelerate millions of training iterations. Our method scales to complex, realistic architectures and is hardware-aware through the use of accelerator-specific, profile-based cost models. In addition to reducing training cost, Checkmate enables real-world networks to be trained with up to 5.1x larger input sizes. Checkmate is an open-source project, available at https://github.com/parasj/checkmate.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Stochastic Gradient Optimization Techniques · Tensor decomposition and applications
