DQA: An Efficient Method for Deep Quantization of Deep Neural Network Activations
Wenhao Hu, Paul Henderson, Jos\'e Cano

TL;DR
DQA introduces a simple, efficient sub-6-bit activation quantization method for deep neural networks, achieving high accuracy with low computational complexity on resource-constrained devices.
Contribution
The paper presents DQA, a novel sub-6-bit activation quantization technique using shifting and Huffman coding, outperforming existing methods in accuracy and efficiency.
Findings
DQA achieves up to 29.28% accuracy improvement over NoisyQuant.
DQA is effective across multiple models and tasks, including image classification and segmentation.
DQA maintains high accuracy with low-bit quantization levels (3-5 bits).
Abstract
Quantization of Deep Neural Network (DNN) activations is a commonly used technique to reduce compute and memory demands during DNN inference, which can be particularly beneficial on resource-constrained devices. To achieve high accuracy, existing methods for quantizing activations rely on complex mathematical computations or perform extensive searches for the best hyper-parameters. However, these expensive operations are impractical on devices with limited computation capabilities, memory capacities, and energy budgets. Furthermore, many existing methods do not focus on sub-6-bit (or deep) quantization. To fill these gaps, in this paper we propose DQA (Deep Quantization of DNN Activations), a new method that focuses on sub-6-bit quantization of activations and leverages simple shifting-based operations and Huffman coding to be efficient and achieve high accuracy. We evaluate DQA with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsFocus
