Vectorization of Multibyte Floating Point Data Formats
Andrew Anderson, David Gregg

TL;DR
This paper introduces a flexible reduced-precision floating point scheme that can be efficiently accelerated using existing hardware vector units on general-purpose processors, reducing storage and transfer costs.
Contribution
It presents a novel continuum of reduced-precision formats for floating point data and demonstrates hardware-accelerated implementation via compiler support on GPPs.
Findings
Supports lower precision floating point with low overhead
Enables reduced storage and transfer volume
Achieves acceleration using existing vector hardware
Abstract
We propose a scheme for reduced-precision representation of floating point data on a continuum between IEEE-754 floating point types. Our scheme enables the use of lower precision formats for a reduction in storage space requirements and data transfer volume. We describe how our scheme can be accelerated using existing hardware vector units on a general-purpose processor (GPP). Exploiting native vector hardware allows us to support reduced precision floating point with low overhead. We demonstrate that supporting reduced precision in the compiler as opposed to using a library approach can yield a low overhead solution for GPPs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
