Quantune: Post-training Quantization of Convolutional Neural Networks   using Extreme Gradient Boosting for Fast Deployment

Jemin Lee; Misun Yu; Yongin Kwon; Taeho Kim

arXiv:2202.05048·cs.LG·February 22, 2022

Quantune: Post-training Quantization of Convolutional Neural Networks using Extreme Gradient Boosting for Fast Deployment

Jemin Lee, Misun Yu, Yongin Kwon, Taeho Kim

PDF

1 Repo

TL;DR

Quantune is an auto-tuning framework that uses gradient boosting to efficiently optimize post-training quantization configurations for CNNs, significantly reducing search time while maintaining high accuracy.

Contribution

It introduces Quantune, a gradient boosting-based auto-tuner that accelerates the search for optimal quantization settings, outperforming traditional search methods in speed and accuracy.

Findings

01

Reduces quantization search time by approximately 36.5x.

02

Maintains accuracy loss within 0.07% to 0.65%.

03

Effective across diverse CNN models, including fragile ones.

Abstract

To adopt convolutional neural networks (CNN) for a range of resource-constrained targets, it is necessary to compress the CNN models by performing quantization, whereby precision representation is converted to a lower bit representation. To overcome problems such as sensitivity of the training dataset, high computational requirements, and large time consumption, post-training quantization methods that do not require retraining have been proposed. In addition, to compensate for the accuracy drop without retraining, previous studies on post-training quantization have proposed several complementary methods: calibration, schemes, clipping, granularity, and mixed-precision. To generate a quantized model with minimal error, it is necessary to study all possible combinations of the methods because each of them is complementary and the CNN models have different characteristics. However, an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

etri/nest-compiler
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Average Pooling · Max Pooling · Softmax · Global Average Pooling · Residual Connection · Fire Module · Dropout · Xavier Initialization · Convolution