SAGE: A Storage-Based Approach for Scalable and Efficient Sparse   Generalized Matrix-Matrix Multiplication

Myung-Hwan Jang; Yunyong Ko; Hyuck-Moo Gwon; Ikhyeon Jo; Yongjun Park,; Sang-Wook Kim

arXiv:2308.13626·cs.DS·August 29, 2023

SAGE: A Storage-Based Approach for Scalable and Efficient Sparse Generalized Matrix-Matrix Multiplication

Myung-Hwan Jang, Yunyong Ko, Hyuck-Moo Gwon, Ikhyeon Jo, Yongjun Park,, Sang-Wook Kim

PDF

TL;DR

SAGE introduces a storage-based approach for scalable, efficient sparse matrix multiplication that leverages SSDs and a three-layer architecture to outperform existing methods in large-scale network analysis.

Contribution

The paper proposes SAGE, a novel storage-based SpGEMM method that reduces memory bottlenecks and communication overhead, enabling scalable and efficient large-scale network processing.

Findings

01

SAGE outperforms existing methods in scalability and efficiency.

02

SAGE effectively balances workloads and reduces I/O overhead.

03

The approach prevents buffer overflows with distribution-aware memory allocation.

Abstract

Sparse generalized matrix-matrix multiplication (SpGEMM) is a fundamental operation for real-world network analysis. With the increasing size of real-world networks, the single-machine-based SpGEMM approach cannot perform SpGEMM on large-scale networks, exceeding the size of main memory (i.e., not scalable). Although the distributed-system-based approach could handle large-scale SpGEMM based on multiple machines, it suffers from severe inter-machine communication overhead to aggregate results of multiple machines (i.e., not efficient). To address this dilemma, in this paper, we propose a novel storage-based SpGEMM approach (SAGE) that stores given networks in storage (e.g., SSD) and loads only the necessary parts of the networks into main memory when they are required for processing via a 3-layer architecture. Furthermore, we point out three challenges that could degrade the overall…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.