SparseZipper: Enhancing Matrix Extensions to Accelerate SpGEMM on CPUs

Tuan Ta; Joshua Randall; and Christopher Batten

arXiv:2502.11353·cs.AR·February 18, 2025

SparseZipper: Enhancing Matrix Extensions to Accelerate SpGEMM on CPUs

Tuan Ta, Joshua Randall, and Christopher Batten

PDF

Open Access

TL;DR

SparseZipper is a novel approach that modifies existing matrix extensions to efficiently accelerate sparse matrix multiplication on CPUs, significantly reducing wasted work and maintaining low area overhead.

Contribution

It introduces SparseZipper, a minimal modification to existing architectures that improves sparse-sparse GEMM performance on CPUs with minimal area increase.

Findings

01

Achieves nearly 6x speedup over scalar hash-based SpGEMM

02

More than 2.6x faster than state-of-the-art vectorized SpGEMM

03

Increases systolic array area by only 12.7%

Abstract

The importance of general matrix multiplication (GEMM) is motivating new instruction set extensions for multiplying dense matrices in almost all contemporary ISAs, and these extensions are often implemented using high-performance systolic arrays. However, matrices in emerging workloads are not always dense, and sparse matrices where the vast majority of values are zeros are becoming more common. Existing matrix extensions and micro-architectures cannot efficiently process highly sparse matrices due to two reasons: (1) wasted work when one or both input values are zero; and (2) incompatibility with sparse matrix formats. This work proposes SparseZipper that minimally modifies existing matrix extensions and systolic-array-based micro-architectures specialized for dense-dense GEMM to accelerate sparse-sparse GEMM operating on highly sparse matrices with unstructured sparsity structures.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Algorithms and Data Compression · Advanced Data Storage Technologies