MACKO: Sparse Matrix-Vector Multiplication for Low Sparsity

Vladim\'ir Macko; Vladim\'ir Bo\v{z}a

arXiv:2511.13061·cs.LG·November 18, 2025

MACKO: Sparse Matrix-Vector Multiplication for Low Sparsity

Vladim\'ir Macko, Vladim\'ir Bo\v{z}a

PDF

Open Access

TL;DR

MACKO-SpMV is a GPU-efficient sparse matrix-vector multiplication method that significantly reduces memory and increases speed for low sparsity LLMs without requiring specialized hardware.

Contribution

The paper introduces MACKO-SpMV, a novel GPU-optimized format and kernel that improves efficiency for unstructured sparsity in LLM inference without specialized hardware.

Findings

01

1.5x memory reduction at 50% sparsity

02

Speedup of 1.2-1.5x over dense representation

03

Significant improvements over existing SpMV baselines

Abstract

Sparse Matrix-Vector Multiplication (SpMV) is a fundamental operation in the inference of sparse Large Language Models (LLMs). Because existing SpMV methods perform poorly under the low and unstructured sparsity (30-90%) commonly observed in pruned LLMs, unstructured pruning provided only limited memory reduction and speedup. We propose MACKO-SpMV, a GPU-optimized format and kernel co-designed to reduce storage overhead while preserving compatibility with the GPU's execution model. This enables efficient SpMV for unstructured sparsity without specialized hardware units (e.g., tensor cores) or format-specific precomputation. Empirical results show that at sparsity 50%, MACKO is the first approach with significant 1.5x memory reduction and 1.2-1.5x speedup over dense representation. Speedups over other SpMV baselines: 2.8-13.0x over cuSPARSE, 1.9-2.6x over Sputnik, and 2.2-2.5x over DASP.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Advanced Neural Network Applications · Big Data and Digital Economy