Faster CNNs with Direct Sparse Convolutions and Guided Pruning
Jongsoo Park, Sheng Li, Wei Wen, Ping Tak Peter Tang, Hai Li, Yiran, Chen, Pradeep Dubey

TL;DR
This paper introduces a novel method for pruning CNNs that enhances both size reduction and inference speed by leveraging a flexible sparse-dense matrix multiplication approach and a predictive performance model, achieving significant speedups across diverse hardware.
Contribution
The paper presents a new pruning technique combined with an efficient sparse convolution implementation and a performance model, enabling faster CNN inference without sacrificing sparsity.
Findings
Achieved 3.1--7.3× speedups over dense convolution in AlexNet.
Demonstrated effectiveness across various hardware platforms from mobile to supercomputers.
Provided an open-source implementation for practical adoption.
Abstract
Phenomenally successful in practical inference problems, convolutional neural networks (CNN) are widely deployed in mobile devices, data centers, and even supercomputers. The number of parameters needed in CNNs, however, are often large and undesirable. Consequently, various methods have been developed to prune a CNN once it is trained. Nevertheless, the resulting CNNs offer limited benefits. While pruning the fully connected layers reduces a CNN's size considerably, it does not improve inference speed noticeably as the compute heavy parts lie in convolutions. Pruning CNNs in a way that increase inference speed often imposes specific sparsity structures, thus limiting the achievable sparsity levels. We present a method to realize simultaneously size economy and speed improvement while pruning CNNs. Paramount to our success is an efficient general sparse-with-dense matrix…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced SAR Imaging Techniques · Advanced Image and Video Retrieval Techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · 1x1 Convolution · Local Response Normalization · Grouped Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Dense Connections · Max Pooling · Softmax · How do I speak to a person at Expedia?-/+/
