Accelerating the SpMV kernel on standard CPUs by exploiting the   partially diagonal structures

Takeshi Fukaya; Koki Ishida; Akie Miura; Takeshi Iwashita; Hiroshi; Nakashima

arXiv:2105.04937·cs.DC·May 12, 2021·6 cites

Accelerating the SpMV kernel on standard CPUs by exploiting the partially diagonal structures

Takeshi Fukaya, Koki Ishida, Akie Miura, Takeshi Iwashita, Hiroshi, Nakashima

PDF

Open Access

TL;DR

This paper introduces the M-HDC format, a modified hybrid storage scheme that exploits partial diagonal structures in sparse matrices to accelerate SpMV operations on modern CPUs, demonstrating significant performance improvements.

Contribution

The paper proposes the M-HDC format, a novel modification of the HDC format, tailored to efficiently utilize partial diagonal structures in sparse matrices for faster SpMV on CPUs.

Findings

01

M-HDC format improves SpMV performance on CPUs for matrices with partial diagonal structures.

02

Cache blocking techniques enhance the efficiency of the proposed SpMV kernels.

03

Experimental results show notable speedups for practical matrices with diagonal patterns.

Abstract

Sparse Matrix Vector multiplication (SpMV) is one of basic building blocks in scientific computing, and acceleration of SpMV has been continuously required. In this research, we aim for accelerating SpMV on recent CPUs for sparse matrices that have a specific sparsity structure, namely a diagonally structured sparsity pattern. We focus a hybrid storage format that combines the DIA and CSR formats, so-called the HDC format. First, we recall the importance of introducing cache blocking techniques into HDC-based SpMV kernels. Next, based on the observation of the cache blocked kernel, we present a modified version of the HDC formats, which we call the M-HDC format, in which partial diagonal structures are expected to be more efficiently picked up. For these SpMV kernels, we theoretically analyze the expected performance improvement based on performance models. Then, we conduct…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Matrix Theory and Algorithms · Model Reduction and Neural Networks