Feature-based SpMV Performance Analysis on Contemporary Devices

Panagiotis Mpakos; Dimitrios Galanopoulos; Petros Anastasiadis; Nikela; Papadopoulou; Nectarios Koziris; Georgios Goumas

arXiv:2302.04225·cs.DC·February 9, 2023

Feature-based SpMV Performance Analysis on Contemporary Devices

Panagiotis Mpakos, Dimitrios Galanopoulos, Petros Anastasiadis, Nikela, Papadopoulou, Nectarios Koziris, Georgios Goumas

PDF

Open Access 2 Repos

TL;DR

This paper analyzes the performance of sparse matrix-vector multiplication (SpMV) across modern computing devices, identifying key bottlenecks and how matrix features influence efficiency on CPUs, GPUs, and FPGAs.

Contribution

It introduces a comprehensive methodology for analyzing SpMV performance based on matrix features across diverse hardware platforms, including artificial and real-world matrices.

Findings

01

GPU performance varies with matrix features

02

Memory bandwidth and load imbalance are critical bottlenecks

03

Different architectures excel with different matrix characteristics

Abstract

The SpMV kernel is characterized by high performance variation per input matrix and computing platform. While GPUs were considered State-of-the-Art for SpMV, with the emergence of advanced multicore CPUs and low-power FPGA accelerators, we need to revisit its performance and energy efficiency. This paper provides a high-level SpMV performance analysis based on structural features of matrices related to common bottlenecks of memory-bandwidth intensity, low ILP, load imbalance and memory latency overheads. Towards this, we create a wide artificial matrix dataset that spans these features and study the performance of different storage formats in nine modern HPC platforms; five CPUs, three GPUs and an FPGA. After validating our proposed methodology using real-world matrices, we analyze our extensive experimental results and draw key insights on the competitiveness of different target…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Storage Technologies · Caching and Content Delivery · Parallel Computing and Optimization Techniques