Non-Structured DNN Weight Pruning -- Is It Beneficial in Any Platform?
Xiaolong Ma, Sheng Lin, Shaokai Ye, Zhezhi He, Linfeng Zhang, Geng, Yuan, Sia Huat Tan, Zhengang Li, Deliang Fan, Xuehai Qian, Xue Lin, Kaisheng, Ma, Yanzhi Wang

TL;DR
This paper evaluates the benefits of non-structured weight pruning in DNNs, demonstrating that it is generally less efficient than structured pruning in terms of storage and computation, and providing a new framework for fair comparison.
Contribution
The authors develop ADMM-NN-S, a comprehensive framework for joint weight pruning and quantization, and establish a methodology for fair comparison between structured and non-structured pruning methods.
Findings
ADMM-NN-S achieves significant weight pruning with minimal accuracy loss.
Fully binarized DNNs can be lossless in accuracy in many cases.
Non-structured pruning is less efficient than structured pruning in storage and computation.
Abstract
Large deep neural network (DNN) models pose the key challenge to energy efficiency due to the significantly higher energy consumption of off-chip DRAM accesses than arithmetic or SRAM operations. It motivates the intensive research on model compression with two main approaches. Weight pruning leverages the redundancy in the number of weights and can be performed in a non-structured, which has higher flexibility and pruning rate but incurs index accesses due to irregular weights, or structured manner, which preserves the full matrix structure with lower pruning rate. Weight quantization leverages the redundancy in the number of bits in weights. Compared to pruning, quantization is much more hardware-friendly, and has become a "must-do" step for FPGA and ASIC implementations. This paper provides a definitive answer to the question for the first time. First, we build ADMM-NN-S by extending…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning
MethodsPruning · 1x1 Convolution · Convolution · Local Response Normalization · Grouped Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Dense Connections · Max Pooling · Softmax
