BitQ: Tailoring Block Floating Point Precision for Improved DNN Efficiency on Resource-Constrained Devices
Yongqi Xu, Yujian Lee, Gao Yi, Bosheng Liu, Yucong Chen, Peng Liu,, Jigang Wu, Xiaoming Chen, Yinhe Han

TL;DR
BitQ introduces an analytical framework to optimize block floating point precision in DNNs, significantly enhancing efficiency on resource-limited embedded devices while maintaining accuracy.
Contribution
We develop a bitwidth-aware analytical model for BFP quantization, optimizing block size and bitwidth distribution for DNN inference on embedded platforms.
Findings
Optimized BFP quantization improves computational efficiency.
Preserves accuracy compared to equal bitwidth settings.
Demonstrates effectiveness on benchmark datasets.
Abstract
Deep neural networks (DNNs) are powerful for cognitive tasks such as image classification, object detection, and scene segmentation. One drawback however is the significant high computational complexity and memory consumption, which makes them unfeasible to run real-time on embedded platforms because of the limited hardware resources. Block floating point (BFP) quantization is one of the representative compression approaches for reducing the memory and computational burden owing to their capability to effectively capture the broad data distribution of DNN models. Unfortunately, prior works on BFP-based quantization empirically choose the block size and the precision that preserve accuracy. In this paper, we develop a BFP-based bitwidth-aware analytical modeling framework (called ``BitQ'') for the best BFP implementation of DNN inference on embedded platforms. We formulate and resolve an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Parallel Computing and Optimization Techniques · Advancements in Semiconductor Devices and Circuit Design
