Partitioning sparse deep neural networks for scalable training and   inference

Gunduz Vehbi Demirci; Hakan Ferhatosmanoglu

arXiv:2104.11805·cs.LG·April 27, 2021

Partitioning sparse deep neural networks for scalable training and inference

Gunduz Vehbi Demirci, Hakan Ferhatosmanoglu

PDF

TL;DR

This paper presents a scalable distributed-memory approach for training and inference of sparse deep neural networks, focusing on efficient sparse matrix operations and novel partitioning methods to enhance performance.

Contribution

It introduces a hypergraph-based partitioning scheme for sparse matrices, improving scalability and reducing communication in distributed training of sparse DNNs.

Findings

01

The proposed method is highly efficient and scalable.

02

Hypergraph partitioning reduces communication volume.

03

Performance improvements are significant with the new scheme.

Abstract

The state-of-the-art deep neural networks (DNNs) have significant computational and data management requirements. The size of both training data and models continue to increase. Sparsification and pruning methods are shown to be effective in removing a large fraction of connections in DNNs. The resulting sparse networks present unique challenges to further improve the computational efficiency of training and inference in deep learning. Both the feedforward (inference) and backpropagation steps in stochastic gradient descent (SGD) algorithm for training sparse DNNs involve consecutive sparse matrix-vector multiplications (SpMVs). We first introduce a distributed-memory parallel SpMV-based solution for the SGD algorithm to improve its scalability. The parallelization approach is based on row-wise partitioning of weight matrices that represent neuron connections between consecutive layers.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsPruning · Stochastic Gradient Descent