Explicit caching HYB: a new high-performance SpMV framework on GPGPU

Chong Chen

arXiv:2204.06666·cs.DC·April 15, 2022·1 cites

Explicit caching HYB: a new high-performance SpMV framework on GPGPU

Chong Chen

PDF

Open Access 1 Repo

TL;DR

This paper introduces EHYB, a new GPU-based SpMV framework that explicitly caches input vectors and optimizes data movement, significantly improving performance over existing methods in finite element method applications.

Contribution

The paper presents a novel explicit caching framework for GPU SpMV that reduces data movement and enhances performance beyond current state-of-the-art approaches.

Findings

01

EHYB outperforms existing GPU SpMV implementations.

02

Significant speedup achieved through explicit input vector caching.

03

Higher FLOPs than theoretical performance bounds.

Abstract

Sparse Matrix-Vector Multiplication (SpMV) is a critical operation for the iterative solver of Finite Element Methods on computer simulation. Since the SpMV operation is a memory-bound algorithm, the efficiency of data movements heavily influenced the performance of the SpMV on GPU. In recent years, many research is conducted in accelerating the performance of SpMV on the graphic processing units (GPU). The performance optimization methods used in existing studies focus on the following areas: improve the load balancing between GPU processors, and reduce the execution divergence between GPU threads. Although some studies have made preliminary optimization on the input vector fetching, the effect of explicitly caching the input vector on GPU base SpMV has not been studied in depth yet. In this study, we are trying to minimize the data movements cost for GPU-based SpMV using a new…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chong-chen-unlv/ehyb_spmv_gpu
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Matrix Theory and Algorithms · Embedded Systems Design Techniques