ZeroQ: A Novel Zero Shot Quantization Framework
Yaohui Cai, Zhewei Yao, Zhen Dong, Amir Gholami, Michael W. Mahoney,, Kurt Keutzer

TL;DR
ZeroQ is a zero-shot neural network quantization framework that optimizes a distilled dataset to enable mixed-precision quantization without access to original data, achieving high accuracy with minimal computational overhead.
Contribution
ZeroQ introduces a novel zero-shot quantization method that uses a distilled dataset and Pareto frontier optimization for automatic mixed-precision setting, eliminating the need for training data.
Findings
Achieves 1.71% higher accuracy on MobileNetV2 compared to DFQ.
Supports both uniform and mixed-precision quantization.
Completes quantization in less than 30 seconds, significantly faster than training.
Abstract
Quantization is a promising approach for reducing the inference time and memory footprint of neural networks. However, most existing quantization methods require access to the original training dataset for retraining during quantization. This is often not possible for applications with sensitive or proprietary data, e.g., due to privacy and security concerns. Existing zero-shot quantization methods use different heuristics to address this, but they result in poor performance, especially when quantizing to ultra-low precision. Here, we propose ZeroQ , a novel zero-shot quantization framework to address this. ZeroQ enables mixed-precision quantization without any access to the training or validation data. This is achieved by optimizing for a Distilled Dataset, which is engineered to match the statistics of batch normalization across different layers of the network. ZeroQ supports both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
ZeroQ: A Novel Zero Shot Quantization Framework· youtube
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
MethodsTest · Label Smoothing · RMSProp · Auxiliary Classifier · Inception-v3 Module · Dropout · Inception-v3 · Grouped Convolution · Depthwise Separable Convolution · 1x1 Convolution
