DELTA: Dynamically Optimizing GPU Memory beyond Tensor Recomputation
Yu Tang, Chenyu Wang, Yufan Zhang, Yuliang Liu, Xingcheng Zhang, Linbo, Qiao, Zhiquan Lai, Dongsheng Li

TL;DR
DELTA introduces a dynamic scheduler that combines tensor swapping and recomputation to significantly reduce GPU memory usage in deep learning, enabling larger batch sizes with minimal performance loss.
Contribution
It is the first to develop a dynamic runtime scheduler integrating tensor swapping and recomputation without user intervention, improving memory efficiency and training capacity.
Findings
Saves 40%-70% GPU memory compared to previous methods.
Enables 2.04× and 2.25× larger batch sizes for ResNet-50 and ResNet-101.
Maintains comparable convergence with acceptable time delay.
Abstract
The further development of deep neural networks is hampered by the limited GPU memory resource. Therefore, the optimization of GPU memory resources is highly demanded. Swapping and recomputation are commonly applied to make better use of GPU memory in deep learning. However, as an emerging domain, several challenges remain:1)The efficiency of recomputation is limited for both static and dynamic methods. 2)Swapping requires offloading parameters manually, which incurs a great time cost. 3) There is no such dynamic and fine-grained method that involves tensor swapping together with tensor recomputation nowadays. To remedy the above issues, we propose a novel scheduler manager named DELTA(Dynamic tEnsor offLoad and recompuTAtion). To the best of our knowledge, we are the first to make a reasonable dynamic runtime scheduler on the combination of tensor swapping and tensor recomputation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTensor decomposition and applications · Advanced Neural Network Applications · Parallel Computing and Optimization Techniques
