HAQ: Hardware-Aware Automated Quantization with Mixed Precision
Kuan Wang, Zhijian Liu, Yujun Lin, Ji Lin, Song Han

TL;DR
This paper introduces HAQ, a reinforcement learning-based framework that automatically determines hardware-aware mixed-precision quantization policies for neural networks, optimizing for latency and energy efficiency across different hardware architectures.
Contribution
The paper presents a fully automated, hardware-aware quantization method using reinforcement learning that adapts policies to specific neural network and hardware architectures, improving efficiency.
Findings
Reduced latency by 1.4-1.95x
Reduced energy consumption by 1.9x
Revealed architecture-specific optimal quantization policies
Abstract
Model quantization is a widely used technique to compress and accelerate deep neural network (DNN) inference. Emergent DNN hardware accelerators begin to support mixed precision (1-8 bits) to further improve the computation efficiency, which raises a great challenge to find the optimal bitwidth for each layer: it requires domain experts to explore the vast design space trading off among accuracy, latency, energy, and model size, which is both time-consuming and sub-optimal. Conventional quantization algorithm ignores the different hardware architectures and quantizes all the layers in a uniform way. In this paper, we introduce the Hardware-Aware Automated Quantization (HAQ) framework which leverages the reinforcement learning to automatically determine the quantization policy, and we take the hardware accelerator's feedback in the design loop. Rather than relying on proxy signals such…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
[CVPR 2019 Oral] HAQ: Hardware-Aware Automated Quantization with Mixed Precision· youtube
Taxonomy
TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Advanced Memory and Neural Computing
