Quality Scalable Quantization Methodology for Deep Learning on Edge

Salman Abdul Khaliq; Rehan Hafiz

arXiv:2407.11260·cs.DC·July 17, 2024

Quality Scalable Quantization Methodology for Deep Learning on Edge

Salman Abdul Khaliq, Rehan Hafiz

PDF

Open Access

TL;DR

This paper introduces a scalable quantization methodology for CNNs that significantly reduces memory and power consumption, enabling efficient deployment of deep learning models on edge devices without substantial accuracy loss.

Contribution

The work presents a systematic, quality scalable quantization and multiplier design that compresses CNN parameters and reduces hardware complexity for edge computing.

Findings

01

Memory savings up to 82.49%

02

Accuracy maintained near state-of-the-art

03

Increased sparsity with up to 6% zeros in weights

Abstract

Deep Learning Architectures employ heavy computations and bulk of the computational energy is taken up by the convolution operations in the Convolutional Neural Networks. The objective of our proposed work is to reduce the energy consumption and size of CNN for using machine learning techniques in edge computing on ubiquitous computing devices. We propose Systematic Quality Scalable Design Methodology consisting of Quality Scalable Quantization on a higher abstraction level and Quality Scalable Multipliers at lower abstraction level. The first component consists of parameter compression where we approximate representation of values in filters of deep learning models by encoding in 3 bits. A shift and scale based on-chip decoding hardware is proposed which can decode these 3-bit representations to recover approximate filter values. The size of the DNN model is reduced this way and can be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing Techniques and Applications · Advanced Computing and Algorithms · Advanced Algorithms and Applications

MethodsConvolution