Pruning vs Quantization: Which is Better?
Andrey Kuzmin, Markus Nagel, Mart van Baalen, Arash Behboodi, Tijmen, Blankevoort

TL;DR
This paper compares neural network pruning and quantization techniques to determine which yields better compression and accuracy, providing theoretical bounds and extensive empirical results across multiple models and tasks.
Contribution
It offers the first comprehensive analytical and empirical comparison between pruning and quantization, guiding hardware design choices for neural network deployment.
Findings
Quantization generally outperforms pruning in most scenarios.
Pruning may be beneficial at very high compression ratios.
The paper provides theoretical bounds for pruning and quantization errors.
Abstract
Neural network pruning and quantization techniques are almost as old as neural networks themselves. However, to date only ad-hoc comparisons between the two have been published. In this paper, we set out to answer the question on which is better: neural network quantization or pruning? By answering this question, we hope to inform design decisions made on neural network hardware going forward. We provide an extensive comparison between the two techniques for compressing deep neural networks. First, we give an analytical comparison of expected quantization and pruning error for general data distributions. Then, we provide lower bounds for the per-layer pruning and quantization error in trained networks, and compare these to empirical error after optimization. Finally, we provide an extensive experimental comparison for training 8 large-scale models on 3 tasks. Our results show that in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Neural Networks and Applications · Machine Learning and Data Classification
MethodsPruning
