Exploring the Potential of Flexible 8-bit Format: Design and Algorithm
Zhuoyi Zhang, Yunchen Zhang, Gonglei Shi, Yu Shen, Ruihao Gong, Xiaoxu, Xia, Qi Zhang, Lewei Lu, Xianglong Liu

TL;DR
This paper investigates the advantages of FP8 neural network quantization, compares it with INT, and introduces a flexible mixed-precision framework that optimizes quantization formats for different architectures, improving efficiency across multiple tasks.
Contribution
It presents a novel flexible mixed-precision quantization framework supporting various number systems, enabling optimal format selection for different neural network architectures.
Findings
FP8 quantization offers competitive performance with full precision.
The proposed framework adapts to various neural network tasks.
Experimental results validate the effectiveness of the mixed-precision approach.
Abstract
Neural network quantization is widely used to reduce model inference complexity in real-world deployments. However, traditional integer quantization suffers from accuracy degradation when adapting to various dynamic ranges. Recent research has focused on a new 8-bit format, FP8, with hardware support for both training and inference of neural networks but lacks guidance for hardware design. In this paper, we analyze the benefits of using FP8 quantization and provide a comprehensive comparison of FP8 with INT quantization. Then we propose a flexible mixed-precision quantization framework that supports various number systems, enabling optimal selection of the most appropriate quantization format for different neural network architectures. Experimental results demonstrate that our proposed framework achieves competitive performance compared to full precision on various tasks, including…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Machine Learning and Data Classification · Adversarial Robustness in Machine Learning
