Elastic Significant Bit Quantization and Acceleration for Deep Neural Networks
Cheng Gong, Ye Lu, Kunpeng Xie, Zongming Jin, Tao Li, Yanzhi Wang

TL;DR
This paper introduces elastic significant bit quantization (ESB), a novel method that optimizes the number of significant bits in DNN quantization to improve accuracy and efficiency, supported by FPGA implementation and extensive experiments.
Contribution
The paper proposes ESB, a flexible quantization method with a distribution aligner, and demonstrates its superior accuracy and efficiency improvements over existing methods.
Findings
ESB achieves up to 4.78% accuracy improvement on AlexNet.
ESB reduces multiplication complexity by fewer significant bits.
ESB FPGA accelerator reaches 10.95 GOPS peak performance.
Abstract
Quantization has been proven to be a vital method for improving the inference efficiency of deep neural networks (DNNs). However, it is still challenging to strike a good balance between accuracy and efficiency while quantizing DNN weights or activation values from high-precision formats to their quantized counterparts. We propose a new method called elastic significant bit quantization (ESB) that controls the number of significant bits of quantized values to obtain better inference accuracy with fewer resources. We design a unified mathematical formula to constrain the quantized values of the ESB with a flexible number of significant bits. We also introduce a distribution difference aligner (DDA) to quantitatively align the distributions between the full-precision weight or activation values and quantized values. Consequently, ESB is suitable for various bell-shaped distributions of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsBatch Normalization · Depthwise Convolution · Pointwise Convolution · 1x1 Convolution · Depthwise Separable Convolution · Convolution · Inverted Residual Block · Average Pooling
