Faster CNNs with Direct Sparse Convolutions and Guided Pruning

Jongsoo Park; Sheng Li; Wei Wen; Ping Tak Peter Tang; Hai Li; Yiran; Chen; Pradeep Dubey

arXiv:1608.01409·cs.CV·August 1, 2017·120 cites

Faster CNNs with Direct Sparse Convolutions and Guided Pruning

Jongsoo Park, Sheng Li, Wei Wen, Ping Tak Peter Tang, Hai Li, Yiran, Chen, Pradeep Dubey

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel method for pruning CNNs that enhances both size reduction and inference speed by leveraging a flexible sparse-dense matrix multiplication approach and a predictive performance model, achieving significant speedups across diverse hardware.

Contribution

The paper presents a new pruning technique combined with an efficient sparse convolution implementation and a performance model, enabling faster CNN inference without sacrificing sparsity.

Findings

01

Achieved 3.1--7.3× speedups over dense convolution in AlexNet.

02

Demonstrated effectiveness across various hardware platforms from mobile to supercomputers.

03

Provided an open-source implementation for practical adoption.

Abstract

Phenomenally successful in practical inference problems, convolutional neural networks (CNN) are widely deployed in mobile devices, data centers, and even supercomputers. The number of parameters needed in CNNs, however, are often large and undesirable. Consequently, various methods have been developed to prune a CNN once it is trained. Nevertheless, the resulting CNNs offer limited benefits. While pruning the fully connected layers reduces a CNN's size considerably, it does not improve inference speed noticeably as the compute heavy parts lie in convolutions. Pruning CNNs in a way that increase inference speed often imposes specific sparsity structures, thus limiting the achievable sparsity levels. We present a method to realize simultaneously size economy and speed improvement while pruning CNNs. Paramount to our success is an efficient general sparse-with-dense matrix…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

IntelLabs/SkimCaffe
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced SAR Imaging Techniques · Advanced Image and Video Retrieval Techniques

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · 1x1 Convolution · Local Response Normalization · Grouped Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Dense Connections · Max Pooling · Softmax · How do I speak to a person at Expedia?-/+/