Improving Locality in Sparse and Dense Matrix Multiplications
Mohammad Mahdi Salehi Dezfuli, Kazem Cheshmi

TL;DR
This paper introduces tile fusion, a runtime technique that enhances data locality in consecutive sparse and dense matrix multiplications, significantly improving performance on multi-core CPUs.
Contribution
The paper presents tile fusion, a novel runtime method that fuses matrix tiles to improve data locality in irregular matrix multiplications, outperforming existing methods.
Findings
Tile fusion achieves nearly 2x speedup over baseline implementations.
It effectively improves data locality in sparse and dense matrix multiplications.
Performance gains are consistent across different multi-core CPU configurations.
Abstract
Consecutive matrix multiplications are commonly used in graph neural networks and sparse linear solvers. These operations frequently access the same matrices for both reading and writing. While reusing these matrices improves data locality, it presents a challenge due to the irregular dependencies between iterations across the two multiplication operations. Existing fusion methods often introduce excessive synchronization overhead or overlapped computations with limited benefits. This paper proposes tile fusion, a runtime approach that fuses tiles of the two matrix-matrix multiplications, where at least one of the involved matrices is sparse. Tile fusion aims to improve data locality while providing sufficient workload for cores in shared-memory multi-core processors. For a pair of matrix-matrix multiplications, tile fusion outperforms unfused baseline and MKL implementations with a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Image Segmentation Techniques · Brain Tumor Detection and Classification · Distributed and Parallel Computing Systems
