A Study of BFLOAT16 for Deep Learning Training
Dhiraj Kalamkar, Dheevatsa Mudigere, Naveen Mellempudi, Dipankar Das,, Kunal Banerjee, Sasikanth Avancha, Dharma Teja Vooturi, Nataraj, Jammalamadaka, Jianyu Huang, Hector Yuen, Jiyan Yang, Jongsoo Park, Alexander, Heinecke, Evangelos Georganas, Sudarshan Srinivasan

TL;DR
This paper empirically demonstrates that BFLOAT16, a half-precision floating-point format with the same range as FP32, can be effectively used for deep learning training across various domains without hyper-parameter tuning.
Contribution
It provides the first comprehensive empirical analysis of BFLOAT16 for deep learning training, showing its efficacy and implementation details across multiple frameworks.
Findings
BFLOAT16 achieves the same SOTA results as FP32 in training.
Training with BFLOAT16 requires no hyper-parameter tuning.
BFLOAT16 simplifies conversion and maintains training stability.
Abstract
This paper presents the first comprehensive empirical study demonstrating the efficacy of the Brain Floating Point (BFLOAT16) half-precision format for Deep Learning training across image classification, speech recognition, language modeling, generative networks and industrial recommendation systems. BFLOAT16 is attractive for Deep Learning training for two reasons: the range of values it can represent is the same as that of IEEE 754 floating-point format (FP32) and conversion to/from FP32 is simple. Maintaining the same range as FP32 is important to ensure that no hyper-parameter tuning is required for convergence; e.g., IEEE 754 compliant half-precision floating point (FP16) requires hyper-parameter tuning. In this paper, we discuss the flow of tensors and various key operations in mixed precision training, and delve into details of operations, such as the rounding modes for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTensor decomposition and applications · Computational Physics and Python Applications · Advanced Neural Network Applications
