SuperNeurons: Dynamic GPU Memory Management for Training Deep Neural Networks
Linnan Wang, Jinmian Ye, Yiyang Zhao, Wei Wu, Ang Li, Shuaiwen Leon, Song, Zenglin Xu, Tim Kraska

TL;DR
SuperNeurons introduces a dynamic GPU memory management system that enables training of deeper neural networks beyond GPU DRAM limits without sacrificing performance.
Contribution
It presents a novel runtime with three memory optimization techniques that significantly extend trainable network depth while maintaining high training performance.
Findings
Enables training of networks over 3 times deeper than existing methods.
Successfully trains ResNet2500 with 10,000 layers on a 12GB GPU.
Achieves at least 3.24 times deeper networks with comparable performance.
Abstract
Going deeper and wider in neural architectures improves the accuracy, while the limited GPU DRAM places an undesired restriction on the network design domain. Deep Learning (DL) practitioners either need change to less desired network architectures, or nontrivially dissect a network across multiGPUs. These distract DL practitioners from concentrating on their original machine learning tasks. We present SuperNeurons: a dynamic GPU memory scheduling runtime to enable the network training far beyond the GPU DRAM capacity. SuperNeurons features 3 memory optimizations, \textit{Liveness Analysis}, \textit{Unified Tensor Pool}, and \textit{Cost-Aware Recomputation}, all together they effectively reduce the network-wide peak memory usage down to the maximal memory usage among layers. We also address the performance issues in those memory saving techniques. Given the limited GPU DRAM,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Ferroelectric and Negative Capacitance Devices · Stochastic Gradient Optimization Techniques
MethodsConvolution
