Pruning and Quantization for Deep Neural Network Acceleration: A Survey

Tailin Liang; John Glossner; Lei Wang; Shaobo Shi; Xiaotong Zhang

arXiv:2101.09671·cs.CV·June 16, 2021·50 cites

Pruning and Quantization for Deep Neural Network Acceleration: A Survey

Tailin Liang, John Glossner, Lei Wang, Shaobo Shi, Xiaotong Zhang

PDF

Open Access

TL;DR

This survey reviews pruning and quantization techniques for deep neural network compression, highlighting their methods, trade-offs, and effectiveness in enabling efficient real-time deployment with minimal accuracy loss.

Contribution

It provides a comprehensive comparison of pruning and quantization methods, analyzing their strengths, weaknesses, and practical applications in neural network compression.

Findings

01

Pruning can be static or dynamic, with various criteria for redundancy removal.

02

Quantization typically reduces precision to 8-bit integers, with lower bit widths also explored.

03

Combined pruning and quantization can further optimize network efficiency.

Abstract

Deep neural networks have been applied in many applications exhibiting extraordinary abilities in the field of computer vision. However, complex network architectures challenge efficient real-time deployment and require significant computation resources and energy costs. These challenges can be overcome through optimizations such as network compression. Network compression can often be realized with little loss of accuracy. In some cases accuracy may even improve. This paper provides a survey on two types of network compression: pruning and quantization. Pruning can be categorized as static if it is performed offline or dynamic if it is performed at run-time. We compare pruning techniques and describe criteria used to remove redundant computations. We discuss trade-offs in element-wise, channel-wise, shape-wise, filter-wise, layer-wise and even network-wise pruning. Quantization reduces…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Sparse and Compressive Sensing Techniques · Advanced Image and Video Retrieval Techniques

MethodsPruning