Space Filling Curves is All You Need: Communication-Avoiding Matrix Multiplication Made Simple
Evangelos Georganas, Alexander Heinecke, Pradeep Dubey

TL;DR
This paper introduces a space filling curve-based approach to matrix multiplication that reduces data movement and tuning complexity, achieving state-of-the-art performance across multiple CPU platforms.
Contribution
It presents a platform- and shape-oblivious matrix multiplication scheme using space filling curves, enabling seamless integration of communication-avoiding algorithms with high data locality.
Findings
Outperforms vendor libraries up to 5.5x on GEMM-shapes
Achieves up to 1.85x speedup in LLM inference prefill
Real-world applications see up to 2.2x speedup
Abstract
General Matrix Multiplication (GEMM) is the cornerstone of HPC workloads and Deep Learning. State-of-the-art vendor libraries tune tensor layouts, parallelization schemes, and cache blocking to minimize data movement across the memory hierarchy and maximize throughput. Optimal settings for these parameters depend on the target platform and matrix shapes, making exhaustive tuning infeasible. We revisit Space Filling Curves (SFC) to alleviate this cumbersome tuning. We partition the Matrix Multiplication using advancements in SFC, and obtain platform-oblivious and shape-oblivious Matrix Multiplication schemes with high degree of data locality. We extend the SFC-based work partitioning to implement Communication-Avoiding (CA) algorithms that provably minimize data movement. The integration of CA-algorithms is seamless with compact code, achieving state-of-the-art results on multiple CPU…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
