Accelerating convolutional neural network by exploiting sparsity on GPUs
Weizhi Xu, Yintai Sun, fhengyu Fan, Hui Yu, Xin Fu

TL;DR
This paper introduces two GPU-based methods to accelerate CNNs by exploiting feature map sparsity and reducing CPU-GPU data transfer, achieving up to 4.3X speedup over cuDNN.
Contribution
It proposes novel techniques to accelerate CNN convolution by leveraging sparsity and combining convolution with pooling to reduce data transfer.
Findings
Up to 3.6X speedup on single-layer convolution for LeNet, AlexNet, GoogLeNet.
Average 3.5X speedup on VGG-19 convolution operations.
Average 4.3X speedup on VGG-19 when combining convolution and pooling.
Abstract
Convolutional neural network (CNN) is an important deep learning method. The convolution operation takes a large proportion of the total execution time for CNN. Feature maps for convolution operation are usually sparse. Multiplications and additions for zero values in the feature map are useless for convolution results. In addition, the convolution layer and pooling layer are computed separately in traditional methods, which leads to frequent data transfer between CPU and GPU. Based on these observations, we propose two new methods to accelerate CNN on GPUs. The first method focuses on accelerating convolution operation and reducing the calculation of zero values. The second method combines the operations of one convolution layer with the following pooling layer to effectively reduce traffic between CPU and GPU. For the first method, we extract some convolution layers from LeNet,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Brain Tumor Detection and Classification
