Echo: Compiler-based GPU Memory Footprint Reduction for LSTM RNN Training
Bojian Zheng, Abhishek Tiwari, Nandita Vijaykumar, Gennady Pekhimenko

TL;DR
Echo is a compiler-based optimization that reduces GPU memory usage during LSTM RNN training by intelligently recomputing feature maps, enabling larger models and faster training without source code changes.
Contribution
It introduces a novel compiler scheme that accurately estimates and manages recomputation overhead to effectively reduce memory footprint during training.
Findings
Achieves an average memory reduction of 1.89X
Maximum reduction of 3.13X in experiments
Enables larger batch sizes and energy savings
Abstract
The Long-Short-Term-Memory Recurrent Neural Networks (LSTM RNNs) are a popular class of machine learning models for analyzing sequential data. Their training on modern GPUs, however, is limited by the GPU memory capacity. Our profiling results of the LSTM RNN-based Neural Machine Translation (NMT) model reveal that feature maps of the attention and RNN layers form the memory bottleneck and runtime is unevenly distributed across different layers when training on GPUs. Based on these two observations, we propose to recompute the feature maps rather than stashing them persistently in the GPU memory. While the idea of feature map recomputation has been considered before, existing solutions fail to deliver satisfactory footprint reduction, as they do not address two key challenges. For each feature map recomputation to be effective and efficient, its effect on (1) the total memory…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Topic Modeling · Multimodal Machine Learning Applications
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
