Cache oblivious storage and access heuristics for blocked matrix-matrix   multiplication

Nicolas Bock; Emanuel H. Rubensson; Pawe{\l} Sa{\l}ek; Anders; M. N. Niklasson; Matt Challacombe

arXiv:0808.1108·cs.DS·August 15, 2008

Cache oblivious storage and access heuristics for blocked matrix-matrix multiplication

Nicolas Bock, Emanuel H. Rubensson, Pawe{\l} Sa{\l}ek, Anders, M. N. Niklasson, Matt Challacombe

PDF

Open Access

TL;DR

This paper explores how the order of operations in blocked matrix multiplication affects performance, revealing that non-contiguous storage can still achieve near-optimal speedups through execution order, especially for small blocks.

Contribution

It demonstrates that execution order, rather than contiguous memory storage, is crucial for optimizing blocked matrix multiplication performance.

Findings

01

Execution order significantly impacts performance.

02

Non-contiguous submatrix storage can still be efficient.

03

Speedup of up to four times for small block sizes.

Abstract

We investigate effects of ordering in blocked matrix--matrix multiplication. We find that submatrices do not have to be stored contiguously in memory to achieve near optimal performance. Instead it is the choice of execution order of the submatrix multiplications that leads to a speedup of up to four times for small block sizes. This is in contrast to results for single matrix elements showing that contiguous memory allocation quickly becomes irrelevant as the blocksize increases.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Storage Technologies · Parallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems