Representation range needs for 16-bit neural network training
Valentina Popescu, Abhinav Venigalla, Di Wu, Robert Schreiber

TL;DR
This paper investigates the necessary exponent range for 16-bit neural network training and proposes a new format, 1/6/9, that improves range-precision tradeoff and speeds up training without loss of accuracy.
Contribution
The paper introduces the 1/6/9 16-bit format, optimizing the range-precision balance for mixed-precision training and reducing the need for denormal numbers.
Findings
1/6/9 format achieves numerical parity with standard mixed-precision.
It speeds up training on hardware with slow or no denormal support.
The format effectively balances range and precision for neural network training.
Abstract
Deep learning has grown rapidly thanks to its state-of-the-art performance across a wide range of real-world applications. While neural networks have been trained using IEEE-754 binary32 arithmetic, the rapid growth of computational demands in deep learning has boosted interest in faster, low precision training. Mixed-precision training that combines IEEE-754 binary16 with IEEE-754 binary32 has been tried, and other -bit formats, for example Google's bfloat16, have become popular. In floating-point arithmetic there is a tradeoff between precision and representation range as the number of exponent bits changes; denormal numbers extend the representation range. This raises questions of how much exponent range is needed, of whether there is a format between binary16 (5 exponent bits) and bfloat16 (8 exponent bits) that works better than either of them, and whether or not denormals are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNumerical Methods and Algorithms · Machine Learning and Algorithms · Neural Networks and Applications
