Efficient Sparse Matrix Kernels based on Adaptive Workload-Balancing and   Parallel-Reduction

Guyue Huang; Guohao Dai; Yu Wang; Yufei Ding; Yuan Xie

arXiv:2106.16064·cs.DC·October 15, 2021·1 cites

Efficient Sparse Matrix Kernels based on Adaptive Workload-Balancing and Parallel-Reduction

Guyue Huang, Guohao Dai, Yu Wang, Yufei Ding, Yuan Xie

PDF

Open Access 1 Repo

TL;DR

This paper introduces optimized sparse matrix kernels that adaptively balance workload and utilize parallel reduction, significantly outperforming cuSPARSE and accelerating GNN training.

Contribution

It provides a comprehensive implementation and analysis of workload-balancing and parallel-reduction techniques for SpMV and SpMM, filling gaps in prior work.

Findings

01

Achieves 1.07-1.57x speedup over cuSPARSE on GPUs.

02

Develops segment-reduction algorithm with SIMD-shuffle primitives.

03

Identifies input data features affecting workload-balancing effectiveness.

Abstract

Sparse matrix-vector and matrix-matrix multiplication (SpMV and SpMM) are fundamental in both conventional (graph analytics, scientific computing) and emerging (sparse DNN, GNN) domains. Workload-balancing and parallel-reduction are widely-used design principles for efficient SpMV. However, prior work fails to resolve how to implement and adaptively use the two principles for SpMV/MM. To overcome this obstacle, we first complete the implementation space with optimizations by filling three missing pieces in prior work, including: (1) We show that workload-balancing and parallel-reduction can be combined through a segment-reduction algorithm implemented with SIMD-shuffle primitives. (2) We show that parallel-reduction can be implemented in SpMM through loading the dense-matrix rows with vector memory operations. (3) We show that vectorized loading of sparse rows, being a part of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hgyhungry/ge-spmm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Graph Theory and Algorithms · Caching and Content Delivery