Batched DGEMMs for scientific codes running on long vector architectures
Fabio Banchelli, Marta Garcia-Gasulla, Filippo Mantovani

TL;DR
This paper develops a batched DGEMM library in C to improve performance on long vector architectures, demonstrating significant speedups and portability across CPU types, including RISC-V and Intel.
Contribution
It introduces a portable batched DGEMM implementation optimized for long vector architectures, enhancing seismic simulation performance.
Findings
Achieved 3.5x to 32.6x speedups over reference implementations.
Successfully integrated batched DGEMM into SeisSol for seismic simulations.
Demonstrated portability and performance improvements on both RISC-V and Intel CPUs.
Abstract
In this work, we evaluate the performance of SeisSol, a simulator of seismic wave phenomena and earthquake dynamics, on a RISC-V-based system utilizing a vector processing unit. We focus on GEMM libraries and address their limited ability to leverage long vector architectures by developing a batched DGEMM library in plain C. This library achieves speedups ranging from approximately 3.5x to 32.6x compared to the reference implementation. We then integrate the batched approach into the SeisSol application, ensuring portability across different CPU architectures. Lastly, we demonstrate that our implementation is portable to an Intel CPU, resulting in improved execution times in most cases.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies
