HAQ: Hardware-Aware Automated Quantization with Mixed Precision

Kuan Wang; Zhijian Liu; Yujun Lin; Ji Lin; Song Han

arXiv:1811.08886·cs.CV·April 9, 2019·49 cites

HAQ: Hardware-Aware Automated Quantization with Mixed Precision

Kuan Wang, Zhijian Liu, Yujun Lin, Ji Lin, Song Han

PDF

Open Access 5 Repos 1 Video

TL;DR

This paper introduces HAQ, a reinforcement learning-based framework that automatically determines hardware-aware mixed-precision quantization policies for neural networks, optimizing for latency and energy efficiency across different hardware architectures.

Contribution

The paper presents a fully automated, hardware-aware quantization method using reinforcement learning that adapts policies to specific neural network and hardware architectures, improving efficiency.

Findings

01

Reduced latency by 1.4-1.95x

02

Reduced energy consumption by 1.9x

03

Revealed architecture-specific optimal quantization policies

Abstract

Model quantization is a widely used technique to compress and accelerate deep neural network (DNN) inference. Emergent DNN hardware accelerators begin to support mixed precision (1-8 bits) to further improve the computation efficiency, which raises a great challenge to find the optimal bitwidth for each layer: it requires domain experts to explore the vast design space trading off among accuracy, latency, energy, and model size, which is both time-consuming and sub-optimal. Conventional quantization algorithm ignores the different hardware architectures and quantizes all the layers in a uniform way. In this paper, we introduce the Hardware-Aware Automated Quantization (HAQ) framework which leverages the reinforcement learning to automatically determine the quantization policy, and we take the hardware accelerator's feedback in the design loop. Rather than relying on proxy signals such…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

[CVPR 2019 Oral] HAQ: Hardware-Aware Automated Quantization with Mixed Precision· youtube

Taxonomy

TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Advanced Memory and Neural Computing