APQ: Joint Search for Network Architecture, Pruning and Quantization   Policy

Tianzhe Wang; Kuan Wang; Han Cai; Ji Lin; Zhijian Liu; Song Han

arXiv:2006.08509·cs.LG·June 16, 2020·22 cites

APQ: Joint Search for Network Architecture, Pruning and Quantization Policy

Tianzhe Wang, Kuan Wang, Han Cai, Ji Lin, Zhijian Liu, Song Han

PDF

Open Access 1 Repo 2 Videos

TL;DR

APQ introduces a joint search method for neural architecture, pruning, and quantization, utilizing a transfer learning approach for accuracy prediction to enable efficient, environmentally friendly model optimization on resource-limited hardware.

Contribution

It proposes a novel joint optimization framework for architecture, pruning, and quantization, with a transfer learning-based accuracy predictor to improve efficiency and reduce training costs.

Findings

01

Achieves 2x latency reduction on ImageNet with same accuracy.

02

Reduces GPU hours and CO2 emissions compared to separate optimization.

03

Outperforms existing methods in accuracy and efficiency.

Abstract

We present APQ for efficient deep learning inference on resource-constrained hardware. Unlike previous methods that separately search the neural architecture, pruning policy, and quantization policy, we optimize them in a joint manner. To deal with the larger design space it brings, a promising approach is to train a quantization-aware accuracy predictor to quickly get the accuracy of the quantized model and feed it to the search engine to select the best fit. However, training this quantization-aware accuracy predictor requires collecting a large number of quantized <model, accuracy> pairs, which involves quantization-aware finetuning and thus is highly time-consuming. To tackle this challenge, we propose to transfer the knowledge from a full-precision (i.e., fp32) accuracy predictor to the quantization-aware (i.e., int8) accuracy predictor, which greatly improves the sample…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mit-han-lab/apq
pytorchOfficial

Videos

[CVPR 2020] APQ: Joint Search for Network Architecture, Pruning and Quantization Policy· youtube

APQ: Joint Search for Network Architecture, Pruning and Quantization Policy· youtube

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning

MethodsPruning