FP8 versus INT8 for efficient deep learning inference
Mart van Baalen, Andrey Kuzmin, Suparna S Nair, Yuwei Ren, Eric, Mahurin, Chirag Patel, Sundar Subramanian, Sanghyuk Lee, Markus Nagel, Joseph, Soriaga, Tijmen Blankevoort

TL;DR
This paper compares FP8 and INT8 formats for efficient deep learning inference, concluding that INT8 is more hardware-efficient and better suited for on-device deployment than FP8.
Contribution
The paper provides a comprehensive theoretical and empirical comparison of FP8 and INT8 formats for inference, highlighting INT8's superior hardware efficiency.
Findings
INT8 is more hardware-efficient than FP8 by 50-180%.
Post-training quantization results favor INT8 for inference.
Converting FP8-trained networks to INT8 maintains accuracy.
Abstract
Recently, the idea of using FP8 as a number format for neural network training has been floating around the deep learning world. Given that most training is currently conducted with entire networks in FP32, or sometimes FP16 with mixed-precision, the step to having some parts of a network run in FP8 with 8-bit weights is an appealing potential speed-up for the generally costly and time-intensive training procedures in deep learning. A natural question arises regarding what this development means for efficient inference on edge devices. In the efficient inference device world, workloads are frequently executed in INT8. Sometimes going even as low as INT4 when efficiency calls for it. In this whitepaper, we compare the performance for both the FP8 and INT formats for efficient on-device inference. We theoretically show the difference between the INT and FP formats for neural networks and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFerroelectric and Negative Capacitance Devices · Advanced Memory and Neural Computing · Advanced Neural Network Applications
