Schr\"odinger's FP: Dynamic Adaptation of Floating-Point Containers for Deep Learning Training
Milo\v{s} Nikoli\'c, Enrique Torres Sanchez, Jiahui Wang, Ali Hadi, Zadeh, Mostafa Mahmoud, Ameer Abdelhadi, Kareem Ibrahim, Andreas Moshovos

TL;DR
This paper introduces dynamic, adaptive floating-point container methods for neural network training that automatically optimize precision to reduce memory footprint and energy use without sacrificing accuracy.
Contribution
It presents novel machine learning-based and loss-based methods for automatically adjusting floating-point precisions during training, eliminating manual trial-and-error.
Findings
Quantum Mantissa and Quantum Exponent reduce footprint by 4.74x.
BitWave achieves a 3.19x reduction in footprint.
Gecko losslessly compresses exponents, further improving compression rates.
Abstract
The transfer of tensors from/to memory during neural network training dominates time and energy. To improve energy efficiency and performance, research has been exploring ways to use narrower data representations. So far, these attempts relied on user-directed trial-and-error to achieve convergence. We present methods that relieve users from this responsibility. Our methods dynamically adjust the size and format of the floating-point containers used for activations and weights during training, achieving adaptivity across three dimensions: i) which datatype to use, ii) on which tensor, and iii) how it changes over time. The different meanings and distributions of exponent and mantissas lead us to tailored approaches for each. We present two lossy pairs of methods to eliminate as many mantissa and exponent bits as possible without affecting accuracy. Quantum Mantissa and Quantum Exponent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Neural Networks and Applications · Stochastic Gradient Optimization Techniques
