GPU Memory Usage Optimization for Backward Propagation in Deep Network   Training

Ding-Yong Hong; Tzu-Hsien Tsai; Ning Wang; Pangfeng Liu; Jan-Jan Wu

arXiv:2502.12499·cs.LG·February 19, 2025

GPU Memory Usage Optimization for Backward Propagation in Deep Network Training

Ding-Yong Hong, Tzu-Hsien Tsai, Ning Wang, Pangfeng Liu, Jan-Jan Wu

PDF

TL;DR

This paper proposes an efficient dynamic programming and a linear-time algorithm for optimal checkpoint selection to minimize GPU memory usage during deep neural network training, balancing memory savings and computational overhead.

Contribution

It introduces a theoretical framework and two algorithms for optimal checkpoint subset selection, significantly improving memory efficiency during training.

Findings

01

The O(n3) algorithm effectively finds the optimal checkpoints.

02

The revised O(n) algorithm achieves similar results with reduced complexity.

03

Experimental results demonstrate substantial memory savings during training.

Abstract

In modern Deep Learning, it has been a trend to design larger Deep Neural Networks (DNNs) for the execution of more complex tasks and better accuracy. On the other hand, Convolutional Neural Networks (CNNs) have become the standard method for most of computer vision tasks. However, the memory allocation for the intermediate data in convolution layers can cause severe memory pressure during model training. Many solutions have been proposed to resolve the problem. Besides hardware-dependent solutions, a general methodology rematerialization can reduce GPU memory usage by trading computation for memory efficiently. The idea is to select a set of intermediate results during the forward phase as checkpoints, and only save them in memory to reduce memory usage. The backward phase recomputes the intermediate data from the closest checkpoints in memory as needed. This recomputation increases…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsConvolution · Sparse Evolutionary Training · Focus