Less Memory Means smaller GPUs: Backpropagation with Compressed Activations
Daniel Barley, Holger Fr\"oning

TL;DR
This paper proposes compressing activation maps during backpropagation in neural network training to reduce memory usage, enabling training on smaller GPUs without sacrificing accuracy, though with longer training times.
Contribution
It introduces a novel method of compressing activations with pooling during backpropagation to decrease memory footprint in DNN training.
Findings
Achieved 29% reduction in peak memory consumption
Maintained prediction accuracy with compressed activations
Longer training schedule required due to compression
Abstract
The ever-growing scale of deep neural networks (DNNs) has lead to an equally rapid growth in computational resource requirements. Many recent architectures, most prominently Large Language Models, have to be trained using supercomputers with thousands of accelerators, such as GPUs or TPUs. Next to the vast number of floating point operations the memory footprint of DNNs is also exploding. In contrast, GPU architectures are notoriously short on memory. Even comparatively small architectures like some EfficientNet variants cannot be trained on a single consumer-grade GPU at reasonable mini-batch sizes. During training, intermediate input activations have to be stored until backpropagation for gradient calculation. These make up the vast majority of the memory footprint. In this work we therefore consider compressing activation maps for the backward pass using pooling, which can reduce…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Medical Image Segmentation Techniques
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Depthwise Convolution · Kaiming Initialization · Pointwise Convolution · Depthwise Separable Convolution · Sigmoid Activation · Batch Normalization · Max Pooling · (FiLe@Against@Claim)How do I file a claim against Expedia? · Convolution
