At-Scale Sparse Deep Neural Network Inference with Efficient GPU   Implementation

Mert Hidayetoglu; Carl Pearson; Vikram Sharma Mailthody; Eiman; Ebrahimi; Jinjun Xiong; Rakesh Nagi; Wen-Mei Hwu

arXiv:2007.14152·cs.DC·September 4, 2020

At-Scale Sparse Deep Neural Network Inference with Efficient GPU Implementation

Mert Hidayetoglu, Carl Pearson, Vikram Sharma Mailthody, Eiman, Ebrahimi, Jinjun Xiong, Rakesh Nagi, Wen-Mei Hwu

PDF

1 Repo

TL;DR

This paper introduces optimized GPU kernels and multi-GPU strategies for sparse deep neural network inference, achieving significant speedups and efficiency improvements over previous methods and champion solutions.

Contribution

It presents novel fused sparse matrix multiplication kernels and a multi-GPU parallelization approach tailored for sparse DNN inference on GPUs.

Findings

01

Up to 180 tera-edges/sec inference throughput.

02

4.3x faster single GPU performance than 2019 champion.

03

2.37x throughput improvement on NVIDIA A100 over V100.

Abstract

This paper presents GPU performance optimization and scaling results for inference models of the Sparse Deep Neural Network Challenge 2020. Demands for network quality have increased rapidly, pushing the size and thus the memory requirements of many neural networks beyond the capacity of available accelerators. Sparse deep neural networks (SpDNN) have shown promise for reining in the memory footprint of large neural networks. However, there is room for improvement in implementing SpDNN operations on GPUs. This work presents optimized sparse matrix multiplication kernels fused with the ReLU function. The optimized kernels reuse input feature maps from the shared memory and sparse weights from registers. For multi-GPU parallelism, our SpDNN implementation duplicates weights and statically partition the feature maps across GPUs. Results for the challenge benchmarks show that the proposed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

merthidayetoglu/SpDNN_Challenge2020
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Methods*Communicated@Fast*How Do I Communicate to Expedia?