DELTA: Dynamically Optimizing GPU Memory beyond Tensor Recomputation

Yu Tang; Chenyu Wang; Yufan Zhang; Yuliang Liu; Xingcheng Zhang; Linbo; Qiao; Zhiquan Lai; Dongsheng Li

arXiv:2203.15980·cs.LG·June 22, 2022·6 cites

DELTA: Dynamically Optimizing GPU Memory beyond Tensor Recomputation

Yu Tang, Chenyu Wang, Yufan Zhang, Yuliang Liu, Xingcheng Zhang, Linbo, Qiao, Zhiquan Lai, Dongsheng Li

PDF

Open Access 1 Repo

TL;DR

DELTA introduces a dynamic scheduler that combines tensor swapping and recomputation to significantly reduce GPU memory usage in deep learning, enabling larger batch sizes with minimal performance loss.

Contribution

It is the first to develop a dynamic runtime scheduler integrating tensor swapping and recomputation without user intervention, improving memory efficiency and training capacity.

Findings

01

Saves 40%-70% GPU memory compared to previous methods.

02

Enables 2.04× and 2.25× larger batch sizes for ResNet-50 and ResNet-101.

03

Maintains comparable convergence with acceptable time delay.

Abstract

The further development of deep neural networks is hampered by the limited GPU memory resource. Therefore, the optimization of GPU memory resources is highly demanded. Swapping and recomputation are commonly applied to make better use of GPU memory in deep learning. However, as an emerging domain, several challenges remain:1)The efficiency of recomputation is limited for both static and dynamic methods. 2)Swapping requires offloading parameters manually, which incurs a great time cost. 3) There is no such dynamic and fine-grained method that involves tensor swapping together with tensor recomputation nowadays. To remedy the above issues, we propose a novel scheduler manager named DELTA(Dynamic tEnsor offLoad and recompuTAtion). To the best of our knowledge, we are the first to make a reasonable dynamic runtime scheduler on the combination of tensor swapping and tensor recomputation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

TonyTangYu/delta-examples
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTensor decomposition and applications · Advanced Neural Network Applications · Parallel Computing and Optimization Techniques