Performance limitations for sparse matrix-vector multiplications on current multicore environments
Gerald Schubert, Georg Hager, Holger Fehske

TL;DR
This paper analyzes the performance bottlenecks of sparse matrix-vector multiplication on multicore processors, comparing different storage schemes and kernels to optimize parallel implementations.
Contribution
It provides a detailed performance analysis of sparse MVM on multicore systems, highlighting limitations and optimization strategies for different storage schemes.
Findings
Performance bottlenecks identified for sparse MVM on multicore systems
Comparison of cache-based and vector-oriented storage schemes
Insights into optimizing parallel sparse MVM implementations
Abstract
The increasing importance of multicore processors calls for a reevaluation of established numerical algorithms in view of their ability to profit from this new hardware concept. In order to optimize the existent algorithms, a detailed knowledge of the different performance-limiting factors is mandatory. In this contribution we investigate sparse matrix-vector multiplication, which is the dominant operation in many sparse eigenvalue solvers. Two conceptually different storage schemes and computational kernels have been conceived in the past to target cache-based and vector architectures, respectively. Starting from a series of microbenchmarks we apply the gained insight on optimized sparse MVM implementations, whose serial and OpenMP-parallel performance we review on state-of-the-art multicore systems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
