MXDOTP: A RISC-V ISA Extension for Enabling Microscaling (MX) Floating-Point Dot Products
Gamze \.Islamo\u{g}lu, Luca Bertaccini, Arpan Suravi Prasad, Francesco Conti, Angelo Garofalo, Luca Benini

TL;DR
This paper introduces MXDOTP, a RISC-V ISA extension for efficient MX floating-point dot products, significantly improving AI matrix multiplication performance and energy efficiency with minimal hardware modifications.
Contribution
It presents the first RISC-V ISA extension for MX dot products, enabling high-utilization, low-power matrix operations on 8-bit MXFP8 formats with minimal hardware impact.
Findings
Achieves up to 356 GFLOPS/W in 12 nm FinFET hardware.
Provides 25x speedup over software baseline.
Offers 12.5x better energy efficiency.
Abstract
Fast and energy-efficient low-bitwidth floating-point (FP) arithmetic is essential for Artificial Intelligence (AI) systems. Microscaling (MX) standardized formats have recently emerged as a promising alternative to baseline low-bitwidth FP formats, offering improved accuracy with a block-wise shared exponent scale combined with per-element values. However, efficiently executing the key linear algebra primitives for AI applications on MX formats requires specialized hardware support for the fundamental operators such as scaled dot product. In this work, we propose MXDOTP, the first RISC-V ISA extension for MX dot products, focusing on the 8-bit MXFP8 FP format. We extend the open-source Snitch RISC-V core with a dedicated MXFP8 dot product-accumulate unit, which fully consumes blocks of eight 8-bit operands packed into 64-bit inputs. To feed MXDOTP at full utilization with four operands…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNumerical Methods and Algorithms · Parallel Computing and Optimization Techniques · Low-power high-performance VLSI design
