SAGE: A Storage-Based Approach for Scalable and Efficient Sparse Generalized Matrix-Matrix Multiplication
Myung-Hwan Jang, Yunyong Ko, Hyuck-Moo Gwon, Ikhyeon Jo, Yongjun Park,, Sang-Wook Kim

TL;DR
SAGE introduces a storage-based approach for scalable, efficient sparse matrix multiplication that leverages SSDs and a three-layer architecture to outperform existing methods in large-scale network analysis.
Contribution
The paper proposes SAGE, a novel storage-based SpGEMM method that reduces memory bottlenecks and communication overhead, enabling scalable and efficient large-scale network processing.
Findings
SAGE outperforms existing methods in scalability and efficiency.
SAGE effectively balances workloads and reduces I/O overhead.
The approach prevents buffer overflows with distribution-aware memory allocation.
Abstract
Sparse generalized matrix-matrix multiplication (SpGEMM) is a fundamental operation for real-world network analysis. With the increasing size of real-world networks, the single-machine-based SpGEMM approach cannot perform SpGEMM on large-scale networks, exceeding the size of main memory (i.e., not scalable). Although the distributed-system-based approach could handle large-scale SpGEMM based on multiple machines, it suffers from severe inter-machine communication overhead to aggregate results of multiple machines (i.e., not efficient). To address this dilemma, in this paper, we propose a novel storage-based SpGEMM approach (SAGE) that stores given networks in storage (e.g., SSD) and loads only the necessary parts of the networks into main memory when they are required for processing via a 3-layer architecture. Furthermore, we point out three challenges that could degrade the overall…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
