Efficient Memory Management for GPU-based Deep Learning Systems
Junzhe Zhang, Sai Ho Yeung, Yao Shu, Bingsheng He, Wei Wang

TL;DR
This paper introduces two novel, system-level memory management techniques for GPU-based deep learning that significantly reduce memory usage without impacting model accuracy, using lifetime analysis and variable swapping strategies.
Contribution
It presents two orthogonal, model-agnostic approaches—lifetime-based memory pooling and variable swapping—for efficient GPU memory management in deep learning systems.
Findings
Memory reduction of up to 13.3% with the proposed memory pool.
Memory footprint reduction of up to 34.2% through variable swapping.
Approaches do not degrade model accuracy or require manual intervention.
Abstract
GPU (graphics processing unit) has been used for many data-intensive applications. Among them, deep learning systems are one of the most important consumer systems for GPU nowadays. As deep learning applications impose deeper and larger models in order to achieve higher accuracy, memory management becomes an important research topic for deep learning systems, given that GPU has limited memory size. Many approaches have been proposed towards this issue, e.g., model compression and memory swapping. However, they either degrade the model accuracy or require a lot of manual intervention. In this paper, we propose two orthogonal approaches to reduce the memory cost from the system perspective. Our approaches are transparent to the models, and thus do not affect the model accuracy. They are achieved by exploiting the iterative nature of the training algorithm of deep learning to derive the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Parallel Computing and Optimization Techniques
