MixQuant: Mixed Precision Quantization with a Bit-width Optimization Search
Eliska Kloberdanz, Wei Le

TL;DR
MixQuant is a novel search algorithm that optimizes layer-wise weight bit-widths in neural network quantization, improving accuracy by minimizing roundoff errors and enhancing existing quantization methods.
Contribution
It introduces a layer-wise bit-width optimization approach for quantization, compatible with any quantization method, to improve model accuracy and efficiency.
Findings
MixQuant improves accuracy when combined with BRECQ.
MixQuant enhances performance of vanilla asymmetric quantization.
Layer-wise bit-width optimization reduces roundoff errors.
Abstract
Quantization is a technique for creating efficient Deep Neural Networks (DNNs), which involves performing computations and storing tensors at lower bit-widths than f32 floating point precision. Quantization reduces model size and inference latency, and therefore allows for DNNs to be deployed on platforms with constrained computational resources and real-time systems. However, quantization can lead to numerical instability caused by roundoff error which leads to inaccurate computations and therefore, a decrease in quantized model accuracy. Similarly to prior works, which have shown that both biases and activations are more sensitive to quantization and are best kept in full precision or quantized with higher bit-widths, we show that some weights are more sensitive than others which should be reflected on their quantization bit-width. To that end we propose MixQuant, a search algorithm…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Parallel Computing and Optimization Techniques · Neural Networks and Applications
