# Fast Strassen-based $A^t A$ Parallel Multiplication

**Authors:** Viviana Arrigoni, Annalisa Massini

arXiv: 1902.02104 · 2019-02-07

## TL;DR

This paper introduces a cache-oblivious, parallel algorithm for efficiently computing the matrix product $A^t A$ using Strassen's method, demonstrating good scalability and minimal communication overhead.

## Contribution

It presents a novel parallel, cache-oblivious algorithm for $A^t A$ multiplication that leverages Strassen's algorithm and exploits matrix symmetry for memory efficiency.

## Key findings

- Achieves reduced computational cost to 2/7 of traditional methods.
- Demonstrates good scalability and speed-up on a cluster.
- Minimal parallel overhead and communication costs.

## Abstract

Matrix multiplication $A^t A$ appears as intermediate operation during the solution of a wide set of problems. In this paper, we propose a new cache-oblivious algorithm for the $A^t A$ multiplication. Our algorithm, A$\scriptstyle \mathsf{T}$A, calls classical Strassen's algorithm as sub-routine, decreasing the computational cost %(expressed in number of performed products) of the conventional $A^t A$ multiplication to $\frac{2}{7}n^{\log_2 7}$. It works for generic rectangular matrices and exploits the peculiar symmetry of the resulting product matrix for sparing memory. We used the MPI paradigm to implement A$\scriptstyle \mathsf{T}$A in parallel, and we tested its performances on a small subset of nodes of the Galileo cluster. Experiments highlight good scalability and speed-up, also thanks to minimal number of exchanged messages in the designed communication system. Parallel overhead and inherently sequential time fraction are negligible in the tested configurations.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1902.02104/full.md

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/1902.02104/full.md

## References

26 references — full list in the complete paper: https://tomesphere.com/paper/1902.02104/full.md

---
Source: https://tomesphere.com/paper/1902.02104