Convolutional Neural Networks Quantization with Attention

Binyi Wu; Bernd Waschneck; Christian Georg Mayr

arXiv:2209.15317·cs.AI·October 3, 2022

Convolutional Neural Networks Quantization with Attention

Binyi Wu, Bernd Waschneck, Christian Georg Mayr

PDF

TL;DR

This paper introduces a double-stage Squeeze-and-Threshold method utilizing attention mechanisms to quantize deep convolutional neural networks, achieving higher accuracy with lower bit precision, notably surpassing full-precision models.

Contribution

The paper presents a novel attention-based quantization method that enables 3-bit models to outperform full-precision models in accuracy.

Findings

01

3-bit model exceeds full-precision accuracy

02

Double-stage ST is easy to implement

03

State-of-the-art quantization results achieved

Abstract

It has been proven that, compared to using 32-bit floating-point numbers in the training phase, Deep Convolutional Neural Networks (DCNNs) can operate with low precision during inference, thereby saving memory space and power consumption. However, quantizing networks is always accompanied by an accuracy decrease. Here, we propose a method, double-stage Squeeze-and-Threshold (double-stage ST). It uses the attention mechanism to quantize networks and achieve state-of-art results. Using our method, the 3-bit model can achieve accuracy that exceeds the accuracy of the full-precision baseline model. The proposed double-stage ST activation quantization is easy to apply: inserting it before the convolution.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.