Differentiable Joint Pruning and Quantization for Hardware Efficiency
Ying Wang, Yadong Lu, Tijmen Blankevoort

TL;DR
This paper introduces a differentiable method that jointly optimizes neural network pruning and quantization to enhance hardware efficiency, outperforming traditional separate approaches in reducing computational costs while maintaining accuracy.
Contribution
The authors propose a novel differentiable joint pruning and quantization framework that automatically balances model compression and accuracy in a single training process.
Findings
Significantly reduces Bit-Operations (BOPs) in neural networks.
Maintains high accuracy despite aggressive compression.
Outperforms two-stage optimization methods in efficiency and accuracy.
Abstract
We present a differentiable joint pruning and quantization (DJPQ) scheme. We frame neural network compression as a joint gradient-based optimization problem, trading off between model pruning and quantization automatically for hardware efficiency. DJPQ incorporates variational information bottleneck based structured pruning and mixed-bit precision quantization into a single differentiable loss function. In contrast to previous works which consider pruning and quantization separately, our method enables users to find the optimal trade-off between both in a single training procedure. To utilize the method for more efficient hardware inference, we extend DJPQ to integrate structured pruning with power-of-two bit-restricted quantization. We show that DJPQ significantly reduces the number of Bit-Operations (BOPs) for several networks while maintaining the top-1 accuracy of original…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Image Enhancement Techniques · Advanced Image and Video Retrieval Techniques
MethodsPruning
