MRQ:Support Multiple Quantization Schemes through Model Re-Quantization

Manasa Manohara; Sankalp Dayal; Tariq Afzal; Rahul Bakshi; Kahkuen Fu

arXiv:2308.01867·cs.LG·August 7, 2023

MRQ:Support Multiple Quantization Schemes through Model Re-Quantization

Manasa Manohara, Sankalp Dayal, Tariq Afzal, Rahul Bakshi, Kahkuen Fu

PDF

Open Access

TL;DR

This paper introduces MRQ, a re-quantization method that transforms existing quantized models to support multiple quantization schemes efficiently, reducing the need for retraining and enabling deployment on diverse hardware.

Contribution

The paper proposes a novel re-quantization approach that supports multiple quantization schemes, improving flexibility and reducing computational costs compared to traditional quantization methods.

Findings

01

Re-quantized MobileNetV2 with less than 0.64 accuracy loss.

02

Supported multiple schemes including symmetric and power-of-2 scales.

03

Successfully deployed on NNA in Echo Show devices.

Abstract

Despite the proliferation of diverse hardware accelerators (e.g., NPU, TPU, DPU), deploying deep learning models on edge devices with fixed-point hardware is still challenging due to complex model quantization and conversion. Existing model quantization frameworks like Tensorflow QAT [1], TFLite PTQ [2], and Qualcomm AIMET [3] supports only a limited set of quantization schemes (e.g., only asymmetric per-tensor quantization in TF1.x QAT [4]). Accordingly, deep learning models cannot be easily quantized for diverse fixed-point hardwares, mainly due to slightly different quantization requirements. In this paper, we envision a new type of model quantization approach called MRQ (model re-quantization), which takes existing quantized models and quickly transforms the models to meet different quantization requirements (e.g., asymmetric -> symmetric, non-power-of-2 scale -> power-of-2 scale).…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Speech Recognition and Synthesis

MethodsDepthwise Convolution · Batch Normalization · Pointwise Convolution · Depthwise Separable Convolution · 1x1 Convolution · Inverted Residual Block · Average Pooling · Convolution