Fault-Tolerant Strassen-Like Matrix Multiplication
Osman B. Guney, Suayb S. Arslan

TL;DR
This paper introduces a fault-tolerant matrix multiplication method using two different Strassen-like algorithms and additional parity computations, achieving high reliability with fewer compute nodes.
Contribution
It presents a novel fault-tolerance approach for Strassen-like algorithms by employing two distinct algorithms and parity checks, reducing resource usage while maintaining performance.
Findings
Outperforms two-copy Strassen-like algorithms in fault tolerance.
Achieves similar performance to three-copy methods with fewer nodes.
Reduces total compute nodes by approximately 24%.
Abstract
In this study, we propose a simple method for fault-tolerant Strassen-like matrix multiplications. The proposed method is based on using two distinct Strassen-like algorithms instead of replicating a given one. We have realized that using two different algorithms, new check relations arise resulting in more local computations. These local computations are found using computer aided search. To improve performance, special parity (extra) sub-matrix multiplications (PSMMs) are generated (two of them) at the expense of increasing communication/computation cost of the system. Our preliminary results demonstrate that the proposed method outperforms a Strassen-like algorithm with two copies and secures a very close performance to three copy version using only 2 PSMMs, reducing the total number of compute nodes by around 24\% i.e., from 21 to 16.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
