All-You-Can-Fit 8-Bit Flexible Floating-Point Format for Accurate and Memory-Efficient Inference of Deep Neural Networks
Cheng-Wei Huang, Tim-Wei Chen, and Juinn-Dar Huang

TL;DR
This paper introduces a highly flexible 8-bit floating-point format (FFP8) for deep neural network inference, optimizing accuracy and memory efficiency by configurable parameters and a methodology tailored to weight and activation distributions.
Contribution
The paper presents a novel configurable 8-bit floating-point format (FFP8) and a methodology for parameter selection to maximize inference accuracy without retraining.
Findings
Achieves only 0.1-0.3% accuracy loss on image classification models
Requires minimal hardware modifications for FFP8 compatibility
No retraining needed for effective inference with FFP8
Abstract
Modern deep neural network (DNN) models generally require a huge amount of weight and activation values to achieve good inference outcomes. Those data inevitably demand a massive off-chip memory capacity/bandwidth, and the situation gets even worse if they are represented in high-precision floating-point formats. Effort has been made for representing those data in different 8-bit floating-point formats, nevertheless, a notable accuracy loss is still unavoidable. In this paper we introduce an extremely flexible 8-bit floating-point (FFP8) format whose defining factors - the bit width of exponent/fraction field, the exponent bias, and even the presence of the sign bit - are all configurable. We also present a methodology to properly determine those factors so that the accuracy of model inference can be maximized. The foundation of this methodology is based on a key observation - both the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Neural Networks and Applications · Sparse and Compressive Sensing Techniques
