Rockmate: an Efficient, Fast, Automatic and Generic Tool for Re-materialization in PyTorch
Xunyi Zhao, Th\'eotime Le Hellard, Lionel Eyraud, Julia Gusak, Olivier, Beaumont

TL;DR
Rockmate is an automatic, efficient tool for controlling memory usage during PyTorch model training by re-computing activations, applicable to various model architectures, with minimal overhead.
Contribution
It introduces Rockmate, a novel method that automatically rewrites models into block structures for efficient memory management during training.
Findings
Reduces activation memory by a factor of 2 to 5
Achieves speed comparable to Rotor and efficiency similar to Checkmate
Maintains low overhead of 10-20% during training
Abstract
We propose Rockmate to control the memory requirements when training PyTorch DNN models. Rockmate is an automatic tool that starts from the model code and generates an equivalent model, using a predefined amount of memory for activations, at the cost of a few re-computations. Rockmate automatically detects the structure of computational and data dependencies and rewrites the initial model as a sequence of complex blocks. We show that such a structure is widespread and can be found in many models in the literature (Transformer based models, ResNet, RegNets,...). This structure allows us to solve the problem in a fast and efficient way, using an adaptation of Checkmate (too slow on the whole model but general) at the level of individual blocks and an adaptation of Rotor (fast but limited to sequential models) at the level of the sequence itself. We show through experiments on many models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsScientific Computing and Data Management · Parallel Computing and Optimization Techniques · Topic Modeling
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Average Pooling · Batch Normalization · Residual Block · Max Pooling · Residual Connection · Global Average Pooling · Kaiming Initialization · 1x1 Convolution · Bottleneck Residual Block
