SMASH: Sparse Matrix Atomic Scratchpad Hashing
Kaustubh Shivdikar

TL;DR
This paper introduces a novel row-wise product SpGEMM kernel using atomic instructions to efficiently handle sparse matrix multiplication, achieving significant speedups on the PIUMA accelerator.
Contribution
A new row-wise product SpGEMM kernel leveraging atomic instructions is proposed, reducing memory overhead and improving performance on specialized hardware.
Findings
Achieves 9.4x speedup over prior approaches
Effectively reduces redundant memory fetches
Optimized for the PIUMA accelerator architecture
Abstract
Sparse matrices, more specifically SpGEMM kernels, are commonly found in a wide range of applications, spanning graph-based path-finding to machine learning algorithms (e.g., neural networks). A particular challenge in implementing SpGEMM kernels has been the pressure placed on DRAM memory. One approach to tackle this problem is to use an inner product method for the SpGEMM kernel implementation. While the inner product produces fewer intermediate results, it can end up saturating the memory bandwidth, given the high number of redundant fetches of the input matrix elements. Using an outer product-based SpGEMM kernel can reduce redundant fetches, but at the cost of increased overhead due to extra computation and memory accesses for producing/managing partial products. In this thesis, we introduce a novel SpGEMM kernel implementation based on the row-wise product approach. We leverage…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
