HAWQ: Hessian AWare Quantization of Neural Networks with Mixed-Precision

Zhen Dong; Zhewei Yao; Amir Gholami; Michael Mahoney; Kurt Keutzer

arXiv:1905.03696·cs.CV·March 29, 2020·36 cites

HAWQ: Hessian AWare Quantization of Neural Networks with Mixed-Precision

Zhen Dong, Zhewei Yao, Amir Gholami, Michael Mahoney, Kurt Keutzer

PDF

Open Access 1 Repo

TL;DR

This paper introduces HAWQ, a second-order method for mixed-precision neural network quantization that automatically determines layer-wise precision and fine-tuning order, leading to efficient compression with minimal accuracy loss.

Contribution

HAWQ provides a systematic, Hessian-based approach for automatic mixed-precision quantization and layer-wise fine-tuning order, improving over existing methods in accuracy and compression.

Findings

01

Achieves 8x activation compression with similar/better accuracy on ResNet20.

02

Up to 1% higher accuracy with 14% smaller models on ResNet50 and Inception-V3.

03

Quantizes SqueezeNext to 1MB with over 68% top-1 accuracy on ImageNet.

Abstract

Model size and inference speed/power have become a major challenge in the deployment of Neural Networks for many applications. A promising approach to address these problems is quantization. However, uniformly quantizing a model to ultra low precision leads to significant accuracy degradation. A novel solution for this is to use mixed-precision quantization, as some parts of the network may allow lower precision as compared to other layers. However, there is no systematic way to determine the precision of different layers. A brute force approach is not feasible for deep networks, as the search space for mixed-precision is exponential in the number of layers. Another challenge is a similar factorial complexity for determining block-wise fine-tuning order when quantizing the model to a target precision. Here, we introduce Hessian AWare Quantization (HAWQ), a novel second-order…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhen-dong/hawq
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning

MethodsSoftmax · *Communicated@Fast*How Do I Communicate to Expedia? · Max Pooling · Xavier Initialization · 1x1 Convolution · Dense Connections · Average Pooling · Convolution · Residual Connection · Global Average Pooling