Batched DGEMMs for scientific codes running on long vector architectures

Fabio Banchelli; Marta Garcia-Gasulla; Filippo Mantovani

arXiv:2501.06175·cs.DC·January 13, 2025

Batched DGEMMs for scientific codes running on long vector architectures

Fabio Banchelli, Marta Garcia-Gasulla, Filippo Mantovani

PDF

Open Access

TL;DR

This paper develops a batched DGEMM library in C to improve performance on long vector architectures, demonstrating significant speedups and portability across CPU types, including RISC-V and Intel.

Contribution

It introduces a portable batched DGEMM implementation optimized for long vector architectures, enhancing seismic simulation performance.

Findings

01

Achieved 3.5x to 32.6x speedups over reference implementations.

02

Successfully integrated batched DGEMM into SeisSol for seismic simulations.

03

Demonstrated portability and performance improvements on both RISC-V and Intel CPUs.

Abstract

In this work, we evaluate the performance of SeisSol, a simulator of seismic wave phenomena and earthquake dynamics, on a RISC-V-based system utilizing a vector processing unit. We focus on GEMM libraries and address their limited ability to leverage long vector architectures by developing a batched DGEMM library in plain C. This library achieves speedups ranging from approximately 3.5x to 32.6x compared to the reference implementation. We then integrate the batched approach into the SeisSol application, ensuring portability across different CPU architectures. Lastly, we demonstrate that our implementation is portable to an Intel CPU, resulting in improved execution times in most cases.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Storage Technologies