AutoQ: Automated Kernel-Wise Neural Network Quantization

Qian Lou; Feng Guo; Lantao Liu; Minje Kim; Lei Jiang

arXiv:1902.05690·cs.LG·February 11, 2020·22 cites

AutoQ: Automated Kernel-Wise Neural Network Quantization

Qian Lou, Feng Guo, Lantao Liu, Minje Kim, Lei Jiang

PDF

Open Access

TL;DR

AutoQ is an automated hierarchical deep reinforcement learning method that optimizes kernel-wise quantization bitwidths in CNNs, significantly reducing latency and energy consumption without sacrificing accuracy.

Contribution

AutoQ introduces a hierarchical-DRL approach for automatic kernel-wise quantization, outperforming prior heuristic and DRL methods in CNN inference efficiency.

Findings

01

Reduces inference latency by 54.06% on average.

02

Decreases energy consumption by 50.69%.

03

Maintains the same inference accuracy as state-of-the-art methods.

Abstract

Network quantization is one of the most hardware friendly techniques to enable the deployment of convolutional neural networks (CNNs) on low-power mobile devices. Recent network quantization techniques quantize each weight kernel in a convolutional layer independently for higher inference accuracy, since the weight kernels in a layer exhibit different variances and hence have different amounts of redundancy. The quantization bitwidth or bit number (QBN) directly decides the inference accuracy, latency, energy and hardware overhead. To effectively reduce the redundancy and accelerate CNN inferences, various weight kernels should be quantized with different QBNs. However, prior works use only one QBN to quantize each convolutional layer or the entire CNN, because the design space of searching a QBN for each weight kernel is too large. The hand-crafted heuristic of the kernel-wise QBN…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning