PCONV: The Missing but Desirable Sparsity in DNN Weight Pruning for Real-time Execution on Mobile Devices
Xiaolong Ma, Fu-Ming Guo, Wei Niu, Xue Lin, Jian Tang, Kaisheng Ma,, Bin Ren, Yanzhi Wang

TL;DR
This paper introduces PCONV, a novel sparsity pattern for DNN weight pruning that combines intra- and inter-convolution pruning to enable real-time, accurate model inference on mobile devices.
Contribution
The paper proposes PCONV, a new pruning method with fine-grained patterns inside coarse structures, and develops a compiler-assisted framework for real-time deployment without accuracy loss.
Findings
PCONV achieves up to 39.2x speedup over TensorFlow-Lite.
PCONV maintains accuracy while significantly increasing pruning rate.
Real-time inference on large DNNs is feasible on mobile devices.
Abstract
Model compression techniques on Deep Neural Network (DNN) have been widely acknowledged as an effective way to achieve acceleration on a variety of platforms, and DNN weight pruning is a straightforward and effective method. There are currently two mainstreams of pruning methods representing two extremes of pruning regularity: non-structured, fine-grained pruning can achieve high sparsity and accuracy, but is not hardware friendly; structured, coarse-grained pruning exploits hardware-efficient structures in pruning, but suffers from accuracy drop when the pruning rate is high. In this paper, we introduce PCONV, comprising a new sparsity dimension, -- fine-grained pruning patterns inside the coarse-grained structures. PCONV comprises two types of sparsities, Sparse Convolution Patterns (SCP) which is generated from intra-convolution kernel pruning and connectivity sparsity generated from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Anomaly Detection Techniques and Applications · Domain Adaptation and Few-Shot Learning
MethodsPruning · Convolution
