Fast and Practical Strassen's Matrix Multiplication using FPGAs
Afzal Ahmad, Linfeng Du, Wei Zhang

TL;DR
This paper introduces an FPGA-based implementation of Strassen's matrix multiplication algorithm that significantly outperforms traditional methods for matrices as small as 256x256, making it practical for real-world applications.
Contribution
The paper presents a novel FPGA implementation of Strassen's algorithm that is faster and more practical for small to medium matrices than previous approaches.
Findings
Achieves superior speed over optimized GeMM for matrices as small as 256x256.
Extensively tested on Alveo U50 and U280 FPGA accelerators.
Matches or surpasses baseline performance across various data types.
Abstract
Matrix multiplication is a cornerstone operation in a wide array of scientific fields, including machine learning and computer graphics. The standard algorithm for matrix multiplication has a complexity of for matrices. Strassen's algorithm improves this to , but its practicality is limited for small to medium matrix sizes due to the large number of additions it introduces. This paper presents a novel FPGA-based implementation of Strassen's algorithm that achieves superior speed over an optimized General Matrix Multiply (GeMM) implementation for matrices as small as . Our design, tested extensively on two high-performance FPGA accelerators (Alveo U50 and U280) across various data types, matches or surpasses the performance of a highly optimized baseline across a range of matrix sizes.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNumerical Methods and Algorithms · VLSI and FPGA Design Techniques · Digital Filter Design and Implementation
