VMXDOTP: A RISC-V Vector ISA Extension for Efficient Microscaling (MX) Format Acceleration
Max Wipfli, Gamze \.Islamo\u{g}lu, Navaneeth Kunhi Purayil, Angelo Garofalo, Luca Benini

TL;DR
This paper introduces VMXDOTP, a RISC-V ISA extension designed to accelerate microscaling (MX) formats in neural network computations, significantly improving efficiency and utilization in vector processing clusters.
Contribution
The paper presents VMXDOTP, a novel ISA extension that enables efficient MX dot product operations, supporting variable block sizes and achieving high utilization and energy efficiency.
Findings
Achieves up to 97% VPE cluster utilization on MX-MatMul.
Delivers up to 125 MXFP8-GFLOPS and 250 MXFP4-GFLOPS performance.
Yields up to 7.0x speedup and 4.9x energy efficiency over software emulation.
Abstract
Compared to the first generation of deep neural networks, dominated by regular, compute-intensive kernels such as matrix multiplications (MatMuls) and convolutions, modern decoder-based transformers interleave attention, normalization, and data-dependent control flow. This demands flexible accelerators, a requirement met by scalable, highly energy-efficient shared-L1-memory vector processing element (VPE) clusters. Meanwhile, the ever-growing size and bandwidth needs of state-of-the-art models make reduced-precision formats increasingly attractive. Microscaling (MX) data formats, based on block floating-point (BFP) representations, have emerged as a promising solution to reduce data volumes while preserving accuracy. However, MX semantics are poorly aligned with vector execution: block scaling and multi-step mixed-precision operations break the regularity of vector pipelines, leading to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Numerical Methods and Algorithms · Low-power high-performance VLSI design
