Communication-Avoiding and Memory-Constrained Sparse Matrix-Matrix   Multiplication at Extreme Scale

Md Taufique Hussain; Oguz Selvitopi; Aydin Bulu\c{c}; Ariful Azad

arXiv:2010.08526·cs.DC·October 19, 2020

Communication-Avoiding and Memory-Constrained Sparse Matrix-Matrix Multiplication at Extreme Scale

Md Taufique Hussain, Oguz Selvitopi, Aydin Bulu\c{c}, Ariful Azad

PDF

2 Repos

TL;DR

This paper introduces a scalable, communication-avoiding, memory-efficient algorithm for sparse matrix-matrix multiplication that performs efficiently at extreme supercomputing scales, significantly accelerating large-scale scientific computations.

Contribution

The paper presents a novel distributed SpGEMM algorithm that scales to over a million threads, addressing communication and memory challenges at extreme scale.

Findings

01

Runs 10x faster on large protein-similarity matrices at 262,144 cores

02

Scales efficiently to hundreds of thousands of processors

03

Handles matrices of any size fitting in aggregated memory

Abstract

Sparse matrix-matrix multiplication (SpGEMM) is a widely used kernel in various graph, scientific computing and machine learning algorithms. In this paper, we consider SpGEMMs performed on hundreds of thousands of processors generating trillions of nonzeros in the output matrix. Distributed SpGEMM at this extreme scale faces two key challenges: (1) high communication cost and (2) inadequate memory to generate the output. We address these challenges with an integrated communication-avoiding and memory-constrained SpGEMM algorithm that scales to 262,144 cores (more than 1 million hardware threads) and can multiply sparse matrices of any size as long as inputs and a fraction of output fit in the aggregated memory. As we go from 16,384 cores to 262,144 cores on a Cray XC40 supercomputer, the new SpGEMM algorithm runs 10x faster when multiplying large-scale protein-similarity matrices.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.