Pruning vs Quantization: Which is Better?

Andrey Kuzmin; Markus Nagel; Mart van Baalen; Arash Behboodi; Tijmen; Blankevoort

arXiv:2307.02973·cs.LG·February 19, 2024·23 cites

Pruning vs Quantization: Which is Better?

Andrey Kuzmin, Markus Nagel, Mart van Baalen, Arash Behboodi, Tijmen, Blankevoort

PDF

Open Access 1 Repo 3 Models 1 Video

TL;DR

This paper compares neural network pruning and quantization techniques to determine which yields better compression and accuracy, providing theoretical bounds and extensive empirical results across multiple models and tasks.

Contribution

It offers the first comprehensive analytical and empirical comparison between pruning and quantization, guiding hardware design choices for neural network deployment.

Findings

01

Quantization generally outperforms pruning in most scenarios.

02

Pruning may be beneficial at very high compression ratios.

03

The paper provides theoretical bounds for pruning and quantization errors.

Abstract

Neural network pruning and quantization techniques are almost as old as neural networks themselves. However, to date only ad-hoc comparisons between the two have been published. In this paper, we set out to answer the question on which is better: neural network quantization or pruning? By answering this question, we hope to inform design decisions made on neural network hardware going forward. We provide an extensive comparison between the two techniques for compressing deep neural networks. First, we give an analytical comparison of expected quantization and pruning error for general data distributions. Then, we provide lower bounds for the per-layer pruning and quantization error in trained networks, and compare these to empirical error after optimization. Finally, we provide an extensive experimental comparison for training 8 large-scale models on 3 tasks. Our results show that in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Qualcomm-AI-research/pruning-vs-quantization
pytorchOfficial

Models

Videos

Pruning vs Quantization: Which is Better?· slideslive

Taxonomy

TopicsAdvanced Neural Network Applications · Neural Networks and Applications · Machine Learning and Data Classification

MethodsPruning