Optimizing Structured-Sparse Matrix Multiplication in RISC-V Vector Processors
Vasileios Titopoulos, Kosmas Alexandridis, Christodoulos Peltekis,, Chrysostomos Nicopoulos, Giorgos Dimitrakopoulos

TL;DR
This paper enhances RISC-V vector processors for structured-sparse matrix multiplication by proposing a new instruction, vindexmac, which significantly improves runtime efficiency for ML applications like CNNs.
Contribution
It introduces the vindexmac instruction to optimize structured-sparse matrix multiplication on RISC-V vector processors, with minimal hardware cost and substantial performance gains.
Findings
vindexmac reduces instruction count per matrix multiplication iteration
Runtime improves by 25-33% with the new instruction
Performance scales effectively for CNN workloads
Abstract
Structured sparsity has been proposed as an efficient way to prune the complexity of Machine Learning (ML) applications and to simplify the handling of sparse data in hardware. Accelerating ML models, whether for training, or inference, heavily relies on matrix multiplications that can be efficiently executed on vector processors, or custom matrix engines. This work aims to integrate the simplicity of structured sparsity into vector execution to speed up the corresponding matrix multiplications. Initially, the implementation of structured-sparse matrix multiplication using the current RISC-V instruction set vector extension is comprehensively explored. Critical parameters that affect performance, such as the impact of data distribution across the scalar and vector register files, data locality, and the effectiveness of loop unrolling are analyzed both qualitatively and quantitatively.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInterconnection Networks and Systems · Parallel Computing and Optimization Techniques · Embedded Systems Design Techniques
