Exploiting Sparsity in Pruned Neural Networks to Optimize Large Model   Training

Siddharth Singh; Abhinav Bhatele

arXiv:2302.05045·cs.LG·May 16, 2023

Exploiting Sparsity in Pruned Neural Networks to Optimize Large Model Training

Siddharth Singh, Abhinav Bhatele

PDF

Open Access

TL;DR

This paper introduces a method to leverage sparsity in pruned neural networks to significantly reduce memory and communication overheads during large-scale parallel training, improving efficiency and speed.

Contribution

The authors develop a novel approach integrated into AxoNN that exploits sparsity to optimize memory and communication in parallel training algorithms.

Findings

01

74% reduction in memory consumption

02

40% decrease in communication time

03

34% overall speedup over AxoNN

Abstract

Parallel training of neural networks at scale is challenging due to significant overheads arising from communication. Recently, deep learning researchers have developed a variety of pruning algorithms that are capable of pruning (i.e. setting to zero) 80-90% of the parameters in a neural network to yield sparse subnetworks that equal the accuracy of the unpruned parent network. In this work, we propose a novel approach that exploits these sparse subnetworks to optimize the memory utilization and communication in two popular algorithms for parallel deep learning namely -- data and inter-layer parallelism. We integrate our approach into AxoNN, a highly scalable framework for parallel deep learning that relies on data and inter-layer parallelism, and demonstrate the reduction in communication time and memory utilization. On 512 NVIDIA V100 GPUs, our optimizations reduce the memory…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Neural Networks and Applications · Stochastic Gradient Optimization Techniques

MethodsPruning