Mixed Precision Training
Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos,, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev,, Ganesh Venkatesh, Hao Wu

TL;DR
This paper presents a method for training deep neural networks using half-precision floating point numbers, significantly reducing memory usage and enabling faster computation without sacrificing accuracy.
Contribution
The authors introduce a novel mixed-precision training technique that maintains a single-precision copy of weights and scales the loss to effectively train large models with half-precision arithmetic.
Findings
Memory consumption reduced by nearly 2x
Effective training of models over 100 million parameters
Compatible with various neural network architectures
Abstract
Deep neural networks have enabled progress in a wide variety of applications. Growing the size of the neural network typically results in improved accuracy. As model sizes grow, the memory and compute requirements for training these models also increases. We introduce a technique to train deep neural networks using half precision floating point numbers. In our technique, weights, activations and gradients are stored in IEEE half-precision format. Half-precision floating numbers have limited numerical range compared to single-precision numbers. We propose two techniques to handle this loss of information. Firstly, we recommend maintaining a single-precision copy of the weights that accumulates the gradients after each optimizer step. This single-precision copy is rounded to half-precision format during training. Secondly, we propose scaling the loss appropriately to handle the loss of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Numerical Methods and Algorithms · Advanced Neural Network Applications
MethodsConvolution
