Efficient Memory Management for GPU-based Deep Learning Systems

Junzhe Zhang; Sai Ho Yeung; Yao Shu; Bingsheng He; Wei Wang

arXiv:1903.06631·cs.DC·March 18, 2019·19 cites

Efficient Memory Management for GPU-based Deep Learning Systems

Junzhe Zhang, Sai Ho Yeung, Yao Shu, Bingsheng He, Wei Wang

PDF

Open Access

TL;DR

This paper introduces two novel, system-level memory management techniques for GPU-based deep learning that significantly reduce memory usage without impacting model accuracy, using lifetime analysis and variable swapping strategies.

Contribution

It presents two orthogonal, model-agnostic approaches—lifetime-based memory pooling and variable swapping—for efficient GPU memory management in deep learning systems.

Findings

01

Memory reduction of up to 13.3% with the proposed memory pool.

02

Memory footprint reduction of up to 34.2% through variable swapping.

03

Approaches do not degrade model accuracy or require manual intervention.

Abstract

GPU (graphics processing unit) has been used for many data-intensive applications. Among them, deep learning systems are one of the most important consumer systems for GPU nowadays. As deep learning applications impose deeper and larger models in order to achieve higher accuracy, memory management becomes an important research topic for deep learning systems, given that GPU has limited memory size. Many approaches have been proposed towards this issue, e.g., model compression and memory swapping. However, they either degrade the model accuracy or require a lot of manual intervention. In this paper, we propose two orthogonal approaches to reduce the memory cost from the system perspective. Our approaches are transparent to the models, and thus do not affect the model accuracy. They are achieved by exploiting the iterative nature of the training algorithm of deep learning to derive the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Parallel Computing and Optimization Techniques