AutoQ: Automated Kernel-Wise Neural Network Quantization
Qian Lou, Feng Guo, Lantao Liu, Minje Kim, Lei Jiang

TL;DR
AutoQ is an automated hierarchical deep reinforcement learning method that optimizes kernel-wise quantization bitwidths in CNNs, significantly reducing latency and energy consumption without sacrificing accuracy.
Contribution
AutoQ introduces a hierarchical-DRL approach for automatic kernel-wise quantization, outperforming prior heuristic and DRL methods in CNN inference efficiency.
Findings
Reduces inference latency by 54.06% on average.
Decreases energy consumption by 50.69%.
Maintains the same inference accuracy as state-of-the-art methods.
Abstract
Network quantization is one of the most hardware friendly techniques to enable the deployment of convolutional neural networks (CNNs) on low-power mobile devices. Recent network quantization techniques quantize each weight kernel in a convolutional layer independently for higher inference accuracy, since the weight kernels in a layer exhibit different variances and hence have different amounts of redundancy. The quantization bitwidth or bit number (QBN) directly decides the inference accuracy, latency, energy and hardware overhead. To effectively reduce the redundancy and accelerate CNN inferences, various weight kernels should be quantized with different QBNs. However, prior works use only one QBN to quantize each convolutional layer or the entire CNN, because the design space of searching a QBN for each weight kernel is too large. The hand-crafted heuristic of the kernel-wise QBN…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning
