Distributed-Memory Sparse Kernels for Machine Learning

Vivek Bharadwaj; Aydin Bulu\c{c}; James Demmel

arXiv:2203.07673·cs.DC·March 22, 2022

Distributed-Memory Sparse Kernels for Machine Learning

Vivek Bharadwaj, Aydin Bulu\c{c}, James Demmel

PDF

Open Access 1 Repo

TL;DR

This paper develops and benchmarks distributed-memory algorithms for fused sparse-dense matrix operations, significantly reducing communication costs and accelerating large-scale machine learning tasks.

Contribution

It introduces novel communication-eliding strategies for fused SDDMM and SpMM kernels, extending distributed algorithms to improve efficiency in machine learning applications.

Findings

01

Fused algorithms save at least 30% communication time compared to sequential execution.

02

Achieve at least 10x speedup over PETSc's SpMM on large real-world matrices.

03

Communication-eliding techniques improve runtime by up to 1.6 times over unoptimized sequences.

Abstract

Sampled Dense Times Dense Matrix Multiplication (SDDMM) and Sparse Times Dense Matrix Multiplication (SpMM) appear in diverse settings, such as collaborative filtering, document clustering, and graph embedding. Frequently, the SDDMM output becomes the input sparse matrix for a subsequent SpMM operation. Existing work has focused on shared memory parallelization of these primitives. While there has been extensive analysis of communication-minimizing distributed 1.5D algorithms for SpMM, no such analysis exists for SDDMM or the back-to-back sequence of SDDMM and SpMM, termed FusedMM. We show that distributed memory 1.5D and 2.5D algorithms for SpMM can be converted to algorithms for SDDMM with identical communication costs and input / output data layouts. Further, we give two communication-eliding strategies to reduce costs further for FusedMM kernels: either reusing the replication of an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

PASSIONLab/distributed_sddmm
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Functional Brain Connectivity Studies · Stochastic Gradient Optimization Techniques