Space Filling Curves is All You Need: Communication-Avoiding Matrix Multiplication Made Simple

Evangelos Georganas; Alexander Heinecke; Pradeep Dubey

arXiv:2601.16294·cs.DC·April 9, 2026

Space Filling Curves is All You Need: Communication-Avoiding Matrix Multiplication Made Simple

Evangelos Georganas, Alexander Heinecke, Pradeep Dubey

PDF

TL;DR

This paper introduces a space filling curve-based approach to matrix multiplication that reduces data movement and tuning complexity, achieving state-of-the-art performance across multiple CPU platforms.

Contribution

It presents a platform- and shape-oblivious matrix multiplication scheme using space filling curves, enabling seamless integration of communication-avoiding algorithms with high data locality.

Findings

01

Outperforms vendor libraries up to 5.5x on GEMM-shapes

02

Achieves up to 1.85x speedup in LLM inference prefill

03

Real-world applications see up to 2.2x speedup

Abstract

General Matrix Multiplication (GEMM) is the cornerstone of HPC workloads and Deep Learning. State-of-the-art vendor libraries tune tensor layouts, parallelization schemes, and cache blocking to minimize data movement across the memory hierarchy and maximize throughput. Optimal settings for these parameters depend on the target platform and matrix shapes, making exhaustive tuning infeasible. We revisit Space Filling Curves (SFC) to alleviate this cumbersome tuning. We partition the Matrix Multiplication using advancements in SFC, and obtain platform-oblivious and shape-oblivious Matrix Multiplication schemes with high degree of data locality. We extend the SFC-based work partitioning to implement Communication-Avoiding (CA) algorithms that provably minimize data movement. The integration of CA-algorithms is seamless with compact code, achieving state-of-the-art results on multiple CPU…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.