Towards Mixed-Precision Quantization of Neural Networks via Constrained   Optimization

Weihan Chen; Peisong Wang; Jian Cheng

arXiv:2110.06554·cs.CV·October 14, 2021

Towards Mixed-Precision Quantization of Neural Networks via Constrained Optimization

Weihan Chen, Peisong Wang, Jian Cheng

PDF

Open Access

TL;DR

This paper introduces a principled, efficient framework for mixed-precision neural network quantization, formulated as a constrained optimization problem and solved via a greedy algorithm, improving accuracy and computational efficiency.

Contribution

The paper presents a novel optimization-based approach for mixed-precision quantization, reformulating it as a MCKP and providing an efficient solution method.

Findings

01

Outperforms existing quantization methods on ImageNet.

02

Achieves better accuracy with lower computational cost.

03

Demonstrates effectiveness across various network architectures.

Abstract

Quantization is a widely used technique to compress and accelerate deep neural networks. However, conventional quantization methods use the same bit-width for all (or most of) the layers, which often suffer significant accuracy degradation in the ultra-low precision regime and ignore the fact that emergent hardware accelerators begin to support mixed-precision computation. Consequently, we present a novel and principled framework to solve the mixed-precision quantization problem in this paper. Briefly speaking, we first formulate the mixed-precision quantization as a discrete constrained optimization problem. Then, to make the optimization tractable, we approximate the objective function with second-order Taylor expansion and propose an efficient approach to compute its Hessian matrix. Finally, based on the above simplification, we show that the original problem can be reformulated as a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Sparse and Compressive Sensing Techniques