Locality-aware parallel block-sparse matrix-matrix multiplication using the Chunks and Tasks programming model
Emanuel H. Rubensson, Elias Rudberg

TL;DR
This paper introduces a scalable parallel method for block-sparse matrix multiplication on distributed systems, leveraging a quadtree representation and the Chunks and Tasks model to optimize data locality and reduce communication.
Contribution
The authors develop a locality-aware parallel multiplication approach using quadtree matrices and the Chunks and Tasks model, enabling efficient scaling and dynamic sparsity detection.
Findings
Achieves favorable weak and strong scaling of communication costs.
Reduces communication significantly compared to non-locality-aware methods.
Effectively utilizes CPUs and GPUs for leaf-level computations.
Abstract
We present a method for parallel block-sparse matrix-matrix multiplication on distributed memory clusters. By using a quadtree matrix representation, data locality is exploited without prior information about the matrix sparsity pattern. A distributed quadtree matrix representation is straightforward to implement due to our recent development of the Chunks and Tasks programming model [Parallel Comput. 40, 328 (2014)]. The quadtree representation combined with the Chunks and Tasks model leads to favorable weak and strong scaling of the communication cost with the number of processes, as shown both theoretically and in numerical experiments. Matrices are represented by sparse quadtrees of chunk objects. The leaves in the hierarchy are block-sparse submatrices. Sparsity is dynamically detected by the matrix library and may occur at any level in the hierarchy and/or within the submatrix…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
