Hardware-Centric AutoML for Mixed-Precision Quantization

Kuan Wang; Zhijian Liu; Yujun Lin; Ji Lin; Song Han

arXiv:2008.04878·cs.CV·August 14, 2020

Hardware-Centric AutoML for Mixed-Precision Quantization

Kuan Wang, Zhijian Liu, Yujun Lin, Ji Lin, Song Han

PDF

Open Access

TL;DR

This paper presents a hardware-aware AutoML framework using reinforcement learning to optimize mixed-precision quantization policies for neural networks, improving efficiency tailored to specific hardware architectures.

Contribution

Introduces HAQ, a reinforcement learning-based framework that automatically finds optimal mixed-precision quantization policies considering hardware feedback, unlike traditional uniform approaches.

Findings

01

Reduced latency by up to 1.95x

02

Lowered energy consumption by 1.9x

03

Customized quantization policies for different hardware architectures

Abstract

Model quantization is a widely used technique to compress and accelerate deep neural network (DNN) inference. Emergent DNN hardware accelerators begin to support mixed precision (1-8 bits) to further improve the computation efficiency, which raises a great challenge to find the optimal bitwidth for each layer: it requires domain experts to explore the vast design space trading off among accuracy, latency, energy, and model size, which is both time-consuming and sub-optimal. Conventional quantization algorithm ignores the different hardware architectures and quantizes all the layers in a uniform way. In this paper, we introduce the Hardware-Aware Automated Quantization (HAQ) framework which leverages the reinforcement learning to automatically determine the quantization policy, and we take the hardware accelerator's feedback in the design loop. Rather than relying on proxy signals such…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Control Systems Optimization · Fault Detection and Control Systems · Image Processing Techniques and Applications