MRQ:Support Multiple Quantization Schemes through Model Re-Quantization
Manasa Manohara, Sankalp Dayal, Tariq Afzal, Rahul Bakshi, Kahkuen Fu

TL;DR
This paper introduces MRQ, a re-quantization method that transforms existing quantized models to support multiple quantization schemes efficiently, reducing the need for retraining and enabling deployment on diverse hardware.
Contribution
The paper proposes a novel re-quantization approach that supports multiple quantization schemes, improving flexibility and reducing computational costs compared to traditional quantization methods.
Findings
Re-quantized MobileNetV2 with less than 0.64 accuracy loss.
Supported multiple schemes including symmetric and power-of-2 scales.
Successfully deployed on NNA in Echo Show devices.
Abstract
Despite the proliferation of diverse hardware accelerators (e.g., NPU, TPU, DPU), deploying deep learning models on edge devices with fixed-point hardware is still challenging due to complex model quantization and conversion. Existing model quantization frameworks like Tensorflow QAT [1], TFLite PTQ [2], and Qualcomm AIMET [3] supports only a limited set of quantization schemes (e.g., only asymmetric per-tensor quantization in TF1.x QAT [4]). Accordingly, deep learning models cannot be easily quantized for diverse fixed-point hardwares, mainly due to slightly different quantization requirements. In this paper, we envision a new type of model quantization approach called MRQ (model re-quantization), which takes existing quantized models and quickly transforms the models to meet different quantization requirements (e.g., asymmetric -> symmetric, non-power-of-2 scale -> power-of-2 scale).…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Speech Recognition and Synthesis
MethodsDepthwise Convolution · Batch Normalization · Pointwise Convolution · Depthwise Separable Convolution · 1x1 Convolution · Inverted Residual Block · Average Pooling · Convolution
