When deep learning models on GPU can be accelerated by taking advantage of unstructured sparsity
Marcin Pietro\'n, Dominik \.Zurek

TL;DR
This paper investigates how unstructured sparsity in CNN models, achieved through non-structural pruning, can be exploited to accelerate convolution operations on GPUs, especially when combined with reduced precision techniques.
Contribution
It demonstrates the conditions under which direct sparse convolution operations improve GPU efficiency and evaluates the impact of reduced precision on performance.
Findings
Sparse convolution can be accelerated on GPUs with unstructured sparsity.
Non-structural pruning achieves high sparsity levels (~90%) in CNN models.
Reduced precision further enhances time efficiency in sparse CNN computations.
Abstract
This paper is focused on the improvement the efficiency of the sparse convolutional neural networks (CNNs) layers on graphic processing units (GPU). The Nvidia deep neural network (cuDnn) library provides the most effective implementation of deep learning (DL) algorithms for GPUs. GPUs are one of the most efficient and commonly used accelerators for deep learning computations. The modern CNN models need megabytes of coefficients and needed millions MAC operations to perform convolution. One of the most common techniques for compressing CNN models is weight pruning. There are two main types of pruning: structural (based on removing whole weight channels) and non-structural (removing individual weights). The first enables much easier acceleration, but with this type it is difficult to achieve a sparsity level and accuracy as high as that obtained with the second type. Non-structural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Neural Networks and Applications · Generative Adversarial Networks and Image Synthesis
MethodsPruning · 1x1 Convolution · Residual Connection · Average Pooling · Bottleneck Residual Block · Batch Normalization · *Communicated@Fast*How Do I Communicate to Expedia? · Max Pooling · Residual Block · Kaiming Initialization
