Compressed Real Numbers for AI: a case-study using a RISC-V CPU
Federico Rossi, Marco Cococcioni, Roger Ferrer Ib\`a\~nez, Jes\`us, Labarta, Filippo Mantovani, Marc Casas, Emanuele Ruffaldi, Sergio Saponara

TL;DR
This paper explores the use of compressed bfloat and posit formats for neural network inference on RISC-V CPUs, proposing a decompression method to improve bandwidth and cache efficiency without degrading accuracy.
Contribution
It introduces a decompression approach for bfloat and posit formats on RISC-V CPUs, optimizing inference performance by reducing bandwidth and cache usage.
Findings
Decompression of bfloat/posit formats improves bandwidth efficiency.
Proposed method maintains neural network accuracy.
Architectural parameters favoring the compressed approach are identified.
Abstract
As recently demonstrated, Deep Neural Networks (DNN), usually trained using single precision IEEE 754 floating point numbers (binary32), can also work using lower precision. Therefore, 16-bit and 8-bit compressed format have attracted considerable attention. In this paper, we focused on two families of formats that have already achieved interesting results in compressing binary32 numbers in machine learning applications, without sensible degradation of the accuracy: bfloat and posit. Even if 16-bit and 8-bit bfloat/posit are routinely used for reducing the storage of the weights/biases of trained DNNs, the inference still often happens on the 32-bit FPU of the CPU (especially if GPUs are not available). In this paper we propose a way to decompress a tensor of bfloat/posits just before computations, i.e., after the compressed operands have been loaded within the vector registers of a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Numerical Methods and Algorithms · Tensor decomposition and applications
