DyBit: Dynamic Bit-Precision Numbers for Efficient Quantized Neural   Network Inference

Jiajun Zhou; Jiajun Wu; Yizhao Gao; Yuhao Ding; Chaofan Tao; Boyu Li,; Fengbin Tu; Kwang-Ting Cheng; Hayden Kwok-Hay So; Ngai Wong

arXiv:2302.12510·cs.LG·February 14, 2024·1 cites

DyBit: Dynamic Bit-Precision Numbers for Efficient Quantized Neural Network Inference

Jiajun Zhou, Jiajun Wu, Yizhao Gao, Yuhao Ding, Chaofan Tao, Boyu Li,, Fengbin Tu, Kwang-Ting Cheng, Hayden Kwok-Hay So, Ngai Wong

PDF

Open Access

TL;DR

DyBit introduces an adaptive, variable-length encoding scheme for low-bit neural network quantization, improving accuracy and speedup in inference through a hardware-aware framework.

Contribution

It proposes DyBit, a novel adaptive data representation for low-bit quantization, and a hardware-aware framework for optimized neural network inference.

Findings

01

DyBit achieves nearly 2% higher accuracy than state-of-the-art at 4-bit quantization.

02

The framework delivers up to 8.1x inference speedup.

03

DyBit effectively adapts to weight and activation distributions.

Abstract

To accelerate the inference of deep neural networks (DNNs), quantization with low-bitwidth numbers is actively researched. A prominent challenge is to quantize the DNN models into low-bitwidth numbers without significant accuracy degradation, especially at very low bitwidths (< 8 bits). This work targets an adaptive data representation with variable-length encoding called DyBit. DyBit can dynamically adjust the precision and range of separate bit-field to be adapted to the DNN weights/activations distribution. We also propose a hardware-aware quantization framework with a mixed-precision accelerator to trade-off the inference accuracy and speedup. Experimental results demonstrate that the inference accuracy via DyBit is 1.997% higher than the state-of-the-art at 4-bit quantization, and the proposed framework can achieve up to 8.1x speedup compared with the original model.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning