F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization
Qing Jin, Jian Ren, Richard Zhuang, Sumant Hanumante, Zhengang Li,, Zhiyu Chen, Yanzhi Wang, Kaiyuan Yang, Sergey Tulyakov

TL;DR
F8Net introduces a fixed-point 8-bit multiplication framework for neural network quantization, reducing computational costs while maintaining or improving accuracy compared to traditional methods.
Contribution
The paper proposes a novel fixed-point 8-bit multiplication approach for network quantization, including automatic layer-specific format selection and reformulation of existing algorithms.
Findings
Achieves comparable or better accuracy than full-precision models.
Reduces inference cost by eliminating high-precision multiplications.
Demonstrates effectiveness on ImageNet with multiple architectures.
Abstract
Neural network quantization is a promising compression technique to reduce memory footprint and save energy consumption, potentially leading to real-time inference. However, there is a performance gap between quantized and full-precision models. To reduce it, existing quantization approaches require high-precision INT32 or full-precision multiplication during inference for scaling or dequantization. This introduces a noticeable cost in terms of memory, speed, and required energy. To tackle these issues, we present F8Net, a novel quantization framework consisting of only fixed-point 8-bit multiplication. To derive our method, we first discuss the advantages of fixed-point multiplication with different formats of fixed-point numbers and study the statistical behavior of the associated fixed-point numbers. Second, based on the statistical and algorithmic analysis, we apply different…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Brain Tumor Detection and Classification · Advanced Image Processing Techniques
