On-Chip Hardware-Aware Quantization for Mixed Precision Neural Networks
Wei Huang, Haotong Qin, Yangdong Liu, Jingzhuo Liang, Yulun Zhang,, Ying Li, Xianglong Liu

TL;DR
This paper introduces an on-chip hardware-aware quantization framework for mixed-precision neural networks, enabling accurate and efficient deployment on edge devices by directly measuring hardware efficiency and estimating accuracy impacts without relying on high-performance simulations.
Contribution
It proposes a novel on-chip quantization pipeline and mask-guided accuracy estimation to optimize mixed-precision models directly on hardware, reducing reliance on simulations and high-power computing.
Findings
Achieves 70% and 73% accuracy on ResNet-18 and MobileNetV3.
Reduces latency by 15-30% compared to INT8 quantization.
Operates effectively on various architectures and compression ratios.
Abstract
Low-bit quantization emerges as one of the most promising compression approaches for deploying deep neural networks on edge devices. Mixed-precision quantization leverages a mixture of bit-widths to unleash the accuracy and efficiency potential of quantized models. However, existing mixed-precision quantization methods rely on simulations in high-performance devices to achieve accuracy and efficiency trade-offs in immense search spaces. This leads to a non-negligible gap between the estimated efficiency metrics and the actual hardware that makes quantized models far away from the optimal accuracy and efficiency, and also causes the quantization process to rely on additional high-performance devices. In this paper, we propose an On-Chip Hardware-Aware Quantization (OHQ) framework, performing hardware-aware mixed-precision quantization on deployed edge devices to achieve accurate and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCCD and CMOS Imaging Sensors · Analog and Mixed-Signal Circuit Design
MethodsAttentive Walk-Aggregating Graph Neural Network · *Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · Depthwise Convolution · Sigmoid Activation · Pointwise Convolution · Depthwise Separable Convolution · Dense Connections · Average Pooling · ReLU6
