Mixed Precision Training

Paulius Micikevicius; Sharan Narang; Jonah Alben; Gregory Diamos,; Erich Elsen; David Garcia; Boris Ginsburg; Michael Houston; Oleksii Kuchaiev,; Ganesh Venkatesh; Hao Wu

arXiv:1710.03740·cs.AI·February 19, 2018·877 cites

Mixed Precision Training

Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos,, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev,, Ganesh Venkatesh, Hao Wu

PDF

Open Access 5 Repos 4 Models

TL;DR

This paper presents a method for training deep neural networks using half-precision floating point numbers, significantly reducing memory usage and enabling faster computation without sacrificing accuracy.

Contribution

The authors introduce a novel mixed-precision training technique that maintains a single-precision copy of weights and scales the loss to effectively train large models with half-precision arithmetic.

Findings

01

Memory consumption reduced by nearly 2x

02

Effective training of models over 100 million parameters

03

Compatible with various neural network architectures

Abstract

Deep neural networks have enabled progress in a wide variety of applications. Growing the size of the neural network typically results in improved accuracy. As model sizes grow, the memory and compute requirements for training these models also increases. We introduce a technique to train deep neural networks using half precision floating point numbers. In our technique, weights, activations and gradients are stored in IEEE half-precision format. Half-precision floating numbers have limited numerical range compared to single-precision numbers. We propose two techniques to handle this loss of information. Firstly, we recommend maintaining a single-precision copy of the weights that accumulates the gradients after each optimizer step. This single-precision copy is rounded to half-precision format during training. Secondly, we propose scaling the loss appropriately to handle the loss of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Numerical Methods and Algorithms · Advanced Neural Network Applications

MethodsConvolution