DQA: An Efficient Method for Deep Quantization of Deep Neural Network   Activations

Wenhao Hu; Paul Henderson; Jos\'e Cano

arXiv:2412.09687·cs.LG·December 16, 2024

DQA: An Efficient Method for Deep Quantization of Deep Neural Network Activations

Wenhao Hu, Paul Henderson, Jos\'e Cano

PDF

TL;DR

DQA introduces a simple, efficient sub-6-bit activation quantization method for deep neural networks, achieving high accuracy with low computational complexity on resource-constrained devices.

Contribution

The paper presents DQA, a novel sub-6-bit activation quantization technique using shifting and Huffman coding, outperforming existing methods in accuracy and efficiency.

Findings

01

DQA achieves up to 29.28% accuracy improvement over NoisyQuant.

02

DQA is effective across multiple models and tasks, including image classification and segmentation.

03

DQA maintains high accuracy with low-bit quantization levels (3-5 bits).

Abstract

Quantization of Deep Neural Network (DNN) activations is a commonly used technique to reduce compute and memory demands during DNN inference, which can be particularly beneficial on resource-constrained devices. To achieve high accuracy, existing methods for quantizing activations rely on complex mathematical computations or perform extensive searches for the best hyper-parameters. However, these expensive operations are impractical on devices with limited computation capabilities, memory capacities, and energy budgets. Furthermore, many existing methods do not focus on sub-6-bit (or deep) quantization. To fill these gaps, in this paper we propose DQA (Deep Quantization of DNN Activations), a new method that focuses on sub-6-bit quantization of activations and leverages simple shifting-based operations and Huffman coding to be efficient and achieve high accuracy. We evaluate DQA with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsFocus