Mixed Precision Training With 8-bit Floating Point

Naveen Mellempudi; Sudarshan Srinivasan; Dipankar Das; Bharat Kaul

arXiv:1905.12334·cs.LG·May 30, 2019·42 cites

Mixed Precision Training With 8-bit Floating Point

Naveen Mellempudi, Sudarshan Srinivasan, Dipankar Das, Bharat Kaul

PDF

Open Access

TL;DR

This paper introduces a novel method for training deep neural networks using 8-bit floating point precision, achieving high accuracy and efficiency improvements across multiple datasets and models.

Contribution

The paper presents a new approach for 8-bit floating point training, including loss scaling and stochastic rounding, enabling state-of-the-art accuracy with reduced precision.

Findings

01

Achieved state-of-the-art accuracy on ImageNet and WMT16 datasets.

02

Demonstrated effective training of various models like ResNet and Transformer at 8-bit precision.

03

Reported slightly higher validation accuracy than full precision baseline.

Abstract

Reduced precision computation for deep neural networks is one of the key areas addressing the widening compute gap driven by an exponential growth in model size. In recent years, deep learning training has largely migrated to 16-bit precision, with significant gains in performance and energy efficiency. However, attempts to train DNNs at 8-bit precision have met with significant challenges because of the higher precision and dynamic range requirements of back-propagation. In this paper, we propose a method to train deep neural networks using 8-bit floating point representation for weights, activations, errors, and gradients. In addition to reducing compute precision, we also reduced the precision requirements for the master copy of weights from 32-bit to 16-bit. We demonstrate state-of-the-art accuracy across multiple data sets (imagenet-1K, WMT16) and a broader set of workloads…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotic Mechanisms and Dynamics · Astronomical Observations and Instrumentation · Image and Object Detection Techniques