Pruning and Quantization for Deep Neural Network Acceleration: A Survey
Tailin Liang, John Glossner, Lei Wang, Shaobo Shi, Xiaotong Zhang

TL;DR
This survey reviews pruning and quantization techniques for deep neural network compression, highlighting their methods, trade-offs, and effectiveness in enabling efficient real-time deployment with minimal accuracy loss.
Contribution
It provides a comprehensive comparison of pruning and quantization methods, analyzing their strengths, weaknesses, and practical applications in neural network compression.
Findings
Pruning can be static or dynamic, with various criteria for redundancy removal.
Quantization typically reduces precision to 8-bit integers, with lower bit widths also explored.
Combined pruning and quantization can further optimize network efficiency.
Abstract
Deep neural networks have been applied in many applications exhibiting extraordinary abilities in the field of computer vision. However, complex network architectures challenge efficient real-time deployment and require significant computation resources and energy costs. These challenges can be overcome through optimizations such as network compression. Network compression can often be realized with little loss of accuracy. In some cases accuracy may even improve. This paper provides a survey on two types of network compression: pruning and quantization. Pruning can be categorized as static if it is performed offline or dynamic if it is performed at run-time. We compare pruning techniques and describe criteria used to remove redundant computations. We discuss trade-offs in element-wise, channel-wise, shape-wise, filter-wise, layer-wise and even network-wise pruning. Quantization reduces…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Sparse and Compressive Sensing Techniques · Advanced Image and Video Retrieval Techniques
MethodsPruning
