Closer in the Gap: Towards Portable Performance on RISC-V Vector Processors
Ruimin Shi, Maya Gokhale, Pei-Hung Lin, Xavier Teruel, Ivy Peng

TL;DR
This paper evaluates the performance and compiler support of RISC-V Vector Extension (RVV) 1.0 hardware, identifying challenges and comparing autovectorization effectiveness in scientific and machine learning workloads.
Contribution
It introduces assembly microbenchmarks for RVV, assesses GCC and LLVM autovectorization, and analyzes RVV support for complex applications like quantum simulation.
Findings
Predication overhead and stride load limit performance.
GCC 15 outperforms LLVM 21 in most applications.
Default LMUL selection is near optimal.
Abstract
The RISC-V Vector Extension~(RVV) is a cornerstone for supporting compute throughout in scientific and machine learning workloads. Yet compiler support and performance monitoring on real RVV~1.0 hardware are still evolving. In this work, we design a suite of assembly microbenchmarks to establish performance ceilings and calibrate performance counters on RVV hardware. Leveraging the assembly benchmarks, we find that predication overhead and stride load pose performance challenges that current compiler cost models do not yet fully address. Moreover, we present the first evaluation of GCC~15 and LLVM~21 autovectorization in HPC and ML proxy applications. GCC~15 outperforms LLVM~21 in four out of six applications. LLVM~21 only outperforms GCC~15 in SGEMM and DGEMM, driven by more aggressive instruction reduction confirmed through validated \texttt{perf} counters on the RVV hardware. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
