Distributed-Memory Parallel Algorithms for Sparse Matrix and Sparse Tall-and-Skinny Matrix Multiplication
Isuru Ranawaka, Md Taufique Hussain, Charles Block, Gerasimos, Gerogiannis, Josep Torrellas, Ariful Azad

TL;DR
This paper introduces a novel distributed-memory algorithm for sparse matrix multiplication involving a square and a tall-skinny matrix, significantly improving performance for applications like graph algorithms and multigrid solvers.
Contribution
A new distributed-memory algorithm for TS-SpGEMM using customized partitioning and sparsity-aware tiling, achieving 5x performance gains over existing methods.
Findings
Achieves 5x speedup over SUMMA algorithms.
Scales efficiently up to 512 nodes (65,536 cores).
Successfully applied to BFS and graph embedding tasks.
Abstract
We consider a sparse matrix-matrix multiplication (SpGEMM) setting where one matrix is square and the other is tall and skinny. This special variant, called TS-SpGEMM, has important applications in multi-source breadth-first search, influence maximization, sparse graph embedding, and algebraic multigrid solvers. Unfortunately, popular distributed algorithms like sparse SUMMA deliver suboptimal performance for TS-SpGEMM. To address this limitation, we develop a novel distributed-memory algorithm tailored for TS-SpGEMM. Our approach employs customized 1D partitioning for all matrices involved and leverages sparsity-aware tiling for efficient data transfers. In addition, it minimizes communication overhead by incorporating both local and remote computations. On average, our TS-SpGEMM algorithm attains 5x performance gains over 2D and 3D SUMMA. Furthermore, we use our algorithm to implement…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Stochastic Gradient Optimization Techniques · Interconnection Networks and Systems
