TL;DR
This paper introduces a novel speculative segmented sum algorithm for sparse matrix-vector multiplication on heterogeneous CPU-GPU processors, significantly improving performance over existing CSR-based methods.
Contribution
It proposes a new speculative execution approach that leverages both CPU and GPU cores for efficient SpMV on heterogeneous processors.
Findings
Achieves significant performance gains on Intel, AMD, and NVIDIA platforms.
Demonstrates effectiveness across 20 benchmark matrices.
Outperforms existing CSR-based SpMV algorithms.
Abstract
Sparse matrix-vector multiplication (SpMV) is a central building block for scientific software and graph applications. Recently, heterogeneous processors composed of different types of cores attracted much attention because of their flexible core configuration and high energy efficiency. In this paper, we propose a compressed sparse row (CSR) format based SpMV algorithm utilizing both types of cores in a CPU-GPU heterogeneous processor. We first speculatively execute segmented sum operations on the GPU part of a heterogeneous processor and generate a possibly incorrect results. Then the CPU part of the same chip is triggered to re-arrange the predicted partial sums for a correct resulting vector. On three heterogeneous processors from Intel, AMD and nVidia, using 20 sparse matrices as a benchmark suite, the experimental results show that our method obtains significant performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
