All-You-Can-Fit 8-Bit Flexible Floating-Point Format for Accurate and   Memory-Efficient Inference of Deep Neural Networks

Cheng-Wei Huang; Tim-Wei Chen; and Juinn-Dar Huang

arXiv:2104.07329·cs.LG·April 27, 2021·1 cites

All-You-Can-Fit 8-Bit Flexible Floating-Point Format for Accurate and Memory-Efficient Inference of Deep Neural Networks

Cheng-Wei Huang, Tim-Wei Chen, and Juinn-Dar Huang

PDF

Open Access

TL;DR

This paper introduces a highly flexible 8-bit floating-point format (FFP8) for deep neural network inference, optimizing accuracy and memory efficiency by configurable parameters and a methodology tailored to weight and activation distributions.

Contribution

The paper presents a novel configurable 8-bit floating-point format (FFP8) and a methodology for parameter selection to maximize inference accuracy without retraining.

Findings

01

Achieves only 0.1-0.3% accuracy loss on image classification models

02

Requires minimal hardware modifications for FFP8 compatibility

03

No retraining needed for effective inference with FFP8

Abstract

Modern deep neural network (DNN) models generally require a huge amount of weight and activation values to achieve good inference outcomes. Those data inevitably demand a massive off-chip memory capacity/bandwidth, and the situation gets even worse if they are represented in high-precision floating-point formats. Effort has been made for representing those data in different 8-bit floating-point formats, nevertheless, a notable accuracy loss is still unavoidable. In this paper we introduce an extremely flexible 8-bit floating-point (FFP8) format whose defining factors - the bit width of exponent/fraction field, the exponent bias, and even the presence of the sign bit - are all configurable. We also present a methodology to properly determine those factors so that the accuracy of model inference can be maximized. The foundation of this methodology is based on a key observation - both the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Neural Networks and Applications · Sparse and Compressive Sensing Techniques