DNA-TEQ: An Adaptive Exponential Quantization of Tensors for DNN   Inference

Bahareh Khabbazan; Marc Riera; Antonio Gonz\'alez

arXiv:2306.16430·cs.LG·November 23, 2023

DNA-TEQ: An Adaptive Exponential Quantization of Tensors for DNN Inference

Bahareh Khabbazan, Marc Riera, Antonio Gonz\'alez

PDF

Open Access

TL;DR

DNA-TEQ introduces an adaptive exponential quantization method for DNN tensors, significantly reducing bit-width and energy consumption while maintaining accuracy, enabling efficient deployment on embedded systems.

Contribution

The paper proposes DNA-TEQ, a novel exponential quantization scheme that adaptively quantizes tensors based on their distribution, outperforming linear methods in compression and energy efficiency.

Findings

01

Achieves 40% average compression over INT8 baseline.

02

Reduces energy consumption by 66% in dot-product operations.

03

Maintains high accuracy without retraining DNNs.

Abstract

Quantization is commonly used in Deep Neural Networks (DNNs) to reduce the storage and computational complexity by decreasing the arithmetical precision of activations and weights, a.k.a. tensors. Efficient hardware architectures employ linear quantization to enable the deployment of recent DNNs onto embedded systems and mobile devices. However, linear uniform quantization cannot usually reduce the numerical precision to less than 8 bits without sacrificing high performance in terms of model accuracy. The performance loss is due to the fact that tensors do not follow uniform distributions. In this paper, we show that a significant amount of tensors fit into an exponential distribution. Then, we propose DNA-TEQ to exponentially quantize DNN tensors with an adaptive scheme that achieves the best trade-off between numerical precision and accuracy loss. The experimental results show that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Parallel Computing and Optimization Techniques · Tensor decomposition and applications