Schr\"odinger's FP: Dynamic Adaptation of Floating-Point Containers for   Deep Learning Training

Milo\v{s} Nikoli\'c; Enrique Torres Sanchez; Jiahui Wang; Ali Hadi; Zadeh; Mostafa Mahmoud; Ameer Abdelhadi; Kareem Ibrahim; Andreas Moshovos

arXiv:2204.13666·cs.LG·May 20, 2024

Schr\"odinger's FP: Dynamic Adaptation of Floating-Point Containers for Deep Learning Training

Milo\v{s} Nikoli\'c, Enrique Torres Sanchez, Jiahui Wang, Ali Hadi, Zadeh, Mostafa Mahmoud, Ameer Abdelhadi, Kareem Ibrahim, Andreas Moshovos

PDF

Open Access

TL;DR

This paper introduces dynamic, adaptive floating-point container methods for neural network training that automatically optimize precision to reduce memory footprint and energy use without sacrificing accuracy.

Contribution

It presents novel machine learning-based and loss-based methods for automatically adjusting floating-point precisions during training, eliminating manual trial-and-error.

Findings

01

Quantum Mantissa and Quantum Exponent reduce footprint by 4.74x.

02

BitWave achieves a 3.19x reduction in footprint.

03

Gecko losslessly compresses exponents, further improving compression rates.

Abstract

The transfer of tensors from/to memory during neural network training dominates time and energy. To improve energy efficiency and performance, research has been exploring ways to use narrower data representations. So far, these attempts relied on user-directed trial-and-error to achieve convergence. We present methods that relieve users from this responsibility. Our methods dynamically adjust the size and format of the floating-point containers used for activations and weights during training, achieving adaptivity across three dimensions: i) which datatype to use, ii) on which tensor, and iii) how it changes over time. The different meanings and distributions of exponent and mantissas lead us to tailored approaches for each. We present two lossy pairs of methods to eliminate as many mantissa and exponent bits as possible without affecting accuracy. Quantum Mantissa and Quantum Exponent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Neural Networks and Applications · Stochastic Gradient Optimization Techniques