Exploiting Sparsity in Pruned Neural Networks to Optimize Large Model Training
Siddharth Singh, Abhinav Bhatele

TL;DR
This paper introduces a method to leverage sparsity in pruned neural networks to significantly reduce memory and communication overheads during large-scale parallel training, improving efficiency and speed.
Contribution
The authors develop a novel approach integrated into AxoNN that exploits sparsity to optimize memory and communication in parallel training algorithms.
Findings
74% reduction in memory consumption
40% decrease in communication time
34% overall speedup over AxoNN
Abstract
Parallel training of neural networks at scale is challenging due to significant overheads arising from communication. Recently, deep learning researchers have developed a variety of pruning algorithms that are capable of pruning (i.e. setting to zero) 80-90% of the parameters in a neural network to yield sparse subnetworks that equal the accuracy of the unpruned parent network. In this work, we propose a novel approach that exploits these sparse subnetworks to optimize the memory utilization and communication in two popular algorithms for parallel deep learning namely -- data and inter-layer parallelism. We integrate our approach into AxoNN, a highly scalable framework for parallel deep learning that relies on data and inter-layer parallelism, and demonstrate the reduction in communication time and memory utilization. On 512 NVIDIA V100 GPUs, our optimizations reduce the memory…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Neural Networks and Applications · Stochastic Gradient Optimization Techniques
MethodsPruning
