Feature-based SpMV Performance Analysis on Contemporary Devices
Panagiotis Mpakos, Dimitrios Galanopoulos, Petros Anastasiadis, Nikela, Papadopoulou, Nectarios Koziris, Georgios Goumas

TL;DR
This paper analyzes the performance of sparse matrix-vector multiplication (SpMV) across modern computing devices, identifying key bottlenecks and how matrix features influence efficiency on CPUs, GPUs, and FPGAs.
Contribution
It introduces a comprehensive methodology for analyzing SpMV performance based on matrix features across diverse hardware platforms, including artificial and real-world matrices.
Findings
GPU performance varies with matrix features
Memory bandwidth and load imbalance are critical bottlenecks
Different architectures excel with different matrix characteristics
Abstract
The SpMV kernel is characterized by high performance variation per input matrix and computing platform. While GPUs were considered State-of-the-Art for SpMV, with the emergence of advanced multicore CPUs and low-power FPGA accelerators, we need to revisit its performance and energy efficiency. This paper provides a high-level SpMV performance analysis based on structural features of matrices related to common bottlenecks of memory-bandwidth intensity, low ILP, load imbalance and memory latency overheads. Towards this, we create a wide artificial matrix dataset that spans these features and study the performance of different storage formats in nine modern HPC platforms; five CPUs, three GPUs and an FPGA. After validating our proposed methodology using real-world matrices, we analyze our extensive experimental results and draw key insights on the competitiveness of different target…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Caching and Content Delivery · Parallel Computing and Optimization Techniques
