Partitioning sparse deep neural networks for scalable training and inference
Gunduz Vehbi Demirci, Hakan Ferhatosmanoglu

TL;DR
This paper presents a scalable distributed-memory approach for training and inference of sparse deep neural networks, focusing on efficient sparse matrix operations and novel partitioning methods to enhance performance.
Contribution
It introduces a hypergraph-based partitioning scheme for sparse matrices, improving scalability and reducing communication in distributed training of sparse DNNs.
Findings
The proposed method is highly efficient and scalable.
Hypergraph partitioning reduces communication volume.
Performance improvements are significant with the new scheme.
Abstract
The state-of-the-art deep neural networks (DNNs) have significant computational and data management requirements. The size of both training data and models continue to increase. Sparsification and pruning methods are shown to be effective in removing a large fraction of connections in DNNs. The resulting sparse networks present unique challenges to further improve the computational efficiency of training and inference in deep learning. Both the feedforward (inference) and backpropagation steps in stochastic gradient descent (SGD) algorithm for training sparse DNNs involve consecutive sparse matrix-vector multiplications (SpMVs). We first introduce a distributed-memory parallel SpMV-based solution for the SGD algorithm to improve its scalability. The parallelization approach is based on row-wise partitioning of weight matrices that represent neuron connections between consecutive layers.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsPruning · Stochastic Gradient Descent
