Convolutional Neural Networks Quantization with Attention
Binyi Wu, Bernd Waschneck, Christian Georg Mayr

TL;DR
This paper introduces a double-stage Squeeze-and-Threshold method utilizing attention mechanisms to quantize deep convolutional neural networks, achieving higher accuracy with lower bit precision, notably surpassing full-precision models.
Contribution
The paper presents a novel attention-based quantization method that enables 3-bit models to outperform full-precision models in accuracy.
Findings
3-bit model exceeds full-precision accuracy
Double-stage ST is easy to implement
State-of-the-art quantization results achieved
Abstract
It has been proven that, compared to using 32-bit floating-point numbers in the training phase, Deep Convolutional Neural Networks (DCNNs) can operate with low precision during inference, thereby saving memory space and power consumption. However, quantizing networks is always accompanied by an accuracy decrease. Here, we propose a method, double-stage Squeeze-and-Threshold (double-stage ST). It uses the attention mechanism to quantize networks and achieve state-of-art results. Using our method, the 3-bit model can achieve accuracy that exceeds the accuracy of the full-precision baseline model. The proposed double-stage ST activation quantization is easy to apply: inserting it before the convolution.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
