Joint Pruning & Quantization for Extremely Sparse Neural Networks
Po-Hsiang Yu, Sih-Sian Wu, Jan P. Klopp, Liang-Gee Chen, Shao-Yi Chien

TL;DR
This paper presents a joint pruning and quantization approach for neural networks to achieve extremely high sparsity, significantly reducing memory and hardware costs while maintaining performance, especially for dense prediction tasks like stereo depth estimation.
Contribution
It introduces a two-stage pruning and quantization pipeline with a novel Taylor Score and fine-tuning mode, enabling extreme sparsity without performance loss.
Findings
Achieves up to 99% memory reduction and 99.9% hardware cost reduction.
Pruning stage outperforms state-of-the-art on ResNet for CIFAR10 and ImageNet.
Effective for dense prediction tasks like stereo depth estimation.
Abstract
We investigate pruning and quantization for deep neural networks. Our goal is to achieve extremely high sparsity for quantized networks to enable implementation on low cost and low power accelerator hardware. In a practical scenario, there are particularly many applications for dense prediction tasks, hence we choose stereo depth estimation as target. We propose a two stage pruning and quantization pipeline and introduce a Taylor Score alongside a new fine-tuning mode to achieve extreme sparsity without sacrificing performance. Our evaluation does not only show that pruning and quantization should be investigated jointly, but also shows that almost 99% of memory demand can be cut while hardware costs can be reduced up to 99.9%. In addition, to compare with other works, we demonstrate that our pruning stage alone beats the state-of-the-art when applied to ResNet on CIFAR10 and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsPruning · 1x1 Convolution · Batch Normalization · Average Pooling · *Communicated@Fast*How Do I Communicate to Expedia? · Residual Connection · Max Pooling · Convolution · Bottleneck Residual Block · Residual Block
