Heuristic Adaptability to Input Dynamics for SpMM on GPUs
Guohao Dai, Guyue Huang, Shang Yang, Zhongming Yu, Hengrui Zhang,, Yufei Ding, Yuan Xie, Huazhong Yang, Yu Wang

TL;DR
This paper introduces a novel auto-tuning approach for SpMM on GPUs that adapts to input data dynamics, significantly improving performance over static algorithms.
Contribution
It proposes a three-loop model for orthogonal design principles and introduces DA-SpMM, a heuristic GPU kernel that dynamically optimizes SpMM performance based on input data.
Findings
DA-SpMM achieves 1.26x to 1.37x speedup over NVIDIA cuSPARSE.
Up to 5.59x end-to-end speedup in Graph Neural Network applications.
The three-loop model covers existing and new SpMM designs.
Abstract
Sparse Matrix-Matrix Multiplication (SpMM) has served as fundamental components in various domains. Many previous studies exploit GPUs for SpMM acceleration because GPUs provide high bandwidth and parallelism. We point out that a static design does not always improve the performance of SpMM on different input data (e.g., >85\% performance loss with a single algorithm). In this paper, we consider the challenge of input dynamics from a novel auto-tuning perspective, while following issues remain to be solved: (1) Orthogonal design principles considering sparsity. Orthogonal design principles for such a sparse problem should be extracted to form different algorithms, and further used for performance tuning. (2) Nontrivial implementations in the algorithm space. Combining orthogonal design principles to create new algorithms needs to tackle with new challenges like thread race handling. (3)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing · Ferroelectric and Negative Capacitance Devices · Interconnection Networks and Systems
