Non-Structured DNN Weight Pruning -- Is It Beneficial in Any Platform?

Xiaolong Ma; Sheng Lin; Shaokai Ye; Zhezhi He; Linfeng Zhang; Geng; Yuan; Sia Huat Tan; Zhengang Li; Deliang Fan; Xuehai Qian; Xue Lin; Kaisheng; Ma; Yanzhi Wang

arXiv:1907.02124·cs.LG·January 9, 2020·28 cites

Non-Structured DNN Weight Pruning -- Is It Beneficial in Any Platform?

Xiaolong Ma, Sheng Lin, Shaokai Ye, Zhezhi He, Linfeng Zhang, Geng, Yuan, Sia Huat Tan, Zhengang Li, Deliang Fan, Xuehai Qian, Xue Lin, Kaisheng, Ma, Yanzhi Wang

PDF

Open Access

TL;DR

This paper evaluates the benefits of non-structured weight pruning in DNNs, demonstrating that it is generally less efficient than structured pruning in terms of storage and computation, and providing a new framework for fair comparison.

Contribution

The authors develop ADMM-NN-S, a comprehensive framework for joint weight pruning and quantization, and establish a methodology for fair comparison between structured and non-structured pruning methods.

Findings

01

ADMM-NN-S achieves significant weight pruning with minimal accuracy loss.

02

Fully binarized DNNs can be lossless in accuracy in many cases.

03

Non-structured pruning is less efficient than structured pruning in storage and computation.

Abstract

Large deep neural network (DNN) models pose the key challenge to energy efficiency due to the significantly higher energy consumption of off-chip DRAM accesses than arithmetic or SRAM operations. It motivates the intensive research on model compression with two main approaches. Weight pruning leverages the redundancy in the number of weights and can be performed in a non-structured, which has higher flexibility and pruning rate but incurs index accesses due to irregular weights, or structured manner, which preserves the full matrix structure with lower pruning rate. Weight quantization leverages the redundancy in the number of bits in weights. Compared to pruning, quantization is much more hardware-friendly, and has become a "must-do" step for FPGA and ASIC implementations. This paper provides a definitive answer to the question for the first time. First, we build ADMM-NN-S by extending…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning

MethodsPruning · 1x1 Convolution · Convolution · Local Response Normalization · Grouped Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Dense Connections · Max Pooling · Softmax