Filter Pre-Pruning for Improved Fine-tuning of Quantized Deep Neural Networks
Jun Nishikawa, Ryoji Ikegaya

TL;DR
This paper introduces a pruning method called PfQ to improve the fine-tuning of quantized deep neural networks by removing filters that hinder accuracy recovery, leading to better performance without extra parameters.
Contribution
The paper proposes a novel pruning technique, PfQ, that enhances quantized DNN fine-tuning by addressing batch normalization issues without additional hyper-parameters.
Findings
Achieves higher accuracy with similar model size compared to conventional methods
Effectively mitigates batch normalization effects during quantization
Improves fine-tuning performance of low-bit quantized DNNs
Abstract
Deep Neural Networks(DNNs) have many parameters and activation data, and these both are expensive to implement. One method to reduce the size of the DNN is to quantize the pre-trained model by using a low-bit expression for weights and activations, using fine-tuning to recover the drop in accuracy. However, it is generally difficult to train neural networks which use low-bit expressions. One reason is that the weights in the middle layer of the DNN have a wide dynamic range and so when quantizing the wide dynamic range into a few bits, the step size becomes large, which leads to a large quantization error and finally a large degradation in accuracy. To solve this problem, this paper makes the following three contributions without using any additional learning parameters and hyper-parameters. First, we analyze how batch normalization, which causes the aforementioned problem, disturbs the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning
MethodsPruning
