A Generalized Zero-Shot Quantization of Deep Convolutional Neural Networks via Learned Weights Statistics
Prasen Kumar Sharma, Arun Abraham, Vikram Nelvoy Rajendiran

TL;DR
This paper introduces a novel zero-shot quantization method for deep CNNs that leverages pretrained weight distributions and data distillation, eliminating the need for original data or BN statistics, and significantly improves accuracy.
Contribution
It proposes the first generalized zero-shot quantization framework using pretrained weights for activation range calibration, applicable to networks without BN layers.
Findings
Outperforms existing zero-shot methods by ~33% in accuracy on MobileNetV2.
Effective across multiple models and open-source frameworks.
First to address post-training zero-shot quantization for unnormalized networks.
Abstract
Quantizing the floating-point weights and activations of deep convolutional neural networks to fixed-point representation yields reduced memory footprints and inference time. Recently, efforts have been afoot towards zero-shot quantization that does not require original unlabelled training samples of a given task. These best-published works heavily rely on the learned batch normalization (BN) parameters to infer the range of the activations for quantization. In particular, these methods are built upon either empirical estimation framework or the data distillation approach, for computing the range of the activations. However, the performance of such schemes severely degrades when presented with a network that does not accommodate BN layers. In this line of thought, we propose a generalized zero-shot quantization (GZSQ) framework that neither requires original data nor relies on BN layer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsDepthwise Convolution · Pointwise Convolution · Depthwise Separable Convolution · 1x1 Convolution · Inverted Residual Block · Convolution · Average Pooling · Batch Normalization
