DNN Quantization with Attention

Ghouthi Boukli Hacene; Lukas Mauch; Stefan Uhlich; Fabien Cardinaux

arXiv:2103.13322·cs.CV·March 25, 2021

DNN Quantization with Attention

Ghouthi Boukli Hacene, Lukas Mauch, Stefan Uhlich, Fabien Cardinaux

PDF

Open Access

TL;DR

This paper introduces DQA, a novel training method that uses attention to relax low-bit quantization of DNNs, enabling high accuracy with reduced memory and energy use.

Contribution

The paper proposes a learnable attention-based approach to progressively relax low-bit quantization, improving accuracy in quantized DNNs compared to existing methods.

Findings

01

Outperforms other low-bit quantization techniques on CIFAR10, CIFAR100, and ImageNet.

02

Achieves near full-precision accuracy with low-bit quantization.

03

Reduces accuracy drop in lightweight DNN architectures.

Abstract

Low-bit quantization of network weights and activations can drastically reduce the memory footprint, complexity, energy consumption and latency of Deep Neural Networks (DNNs). However, low-bit quantization can also cause a considerable drop in accuracy, in particular when we apply it to complex learning tasks or lightweight DNN architectures. In this paper, we propose a training procedure that relaxes the low-bit quantization. We call this procedure \textit{DNN Quantization with Attention} (DQA). The relaxation is achieved by using a learnable linear combination of high, medium and low-bit quantizations. Our learning procedure converges step by step to a low-bit quantization using an attention mechanism with temperature scheduling. In experiments, our approach outperforms other low-bit quantization techniques on various object recognition benchmarks such as CIFAR10, CIFAR100 and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning