Fast and Practical Strassen's Matrix Multiplication using FPGAs

Afzal Ahmad; Linfeng Du; Wei Zhang

arXiv:2406.02088·cs.AR·June 5, 2024

Fast and Practical Strassen's Matrix Multiplication using FPGAs

Afzal Ahmad, Linfeng Du, Wei Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces an FPGA-based implementation of Strassen's matrix multiplication algorithm that significantly outperforms traditional methods for matrices as small as 256x256, making it practical for real-world applications.

Contribution

The paper presents a novel FPGA implementation of Strassen's algorithm that is faster and more practical for small to medium matrices than previous approaches.

Findings

01

Achieves superior speed over optimized GeMM for matrices as small as 256x256.

02

Extensively tested on Alveo U50 and U280 FPGA accelerators.

03

Matches or surpasses baseline performance across various data types.

Abstract

Matrix multiplication is a cornerstone operation in a wide array of scientific fields, including machine learning and computer graphics. The standard algorithm for matrix multiplication has a complexity of $O (n^{3})$ for $n \times n$ matrices. Strassen's algorithm improves this to $O (n^{2.807})$ , but its practicality is limited for small to medium matrix sizes due to the large number of additions it introduces. This paper presents a novel FPGA-based implementation of Strassen's algorithm that achieves superior speed over an optimized General Matrix Multiply (GeMM) implementation for matrices as small as $n = 256$ . Our design, tested extensively on two high-performance FPGA accelerators (Alveo U50 and U280) across various data types, matches or surpasses the performance of a highly optimized baseline across a range of matrix sizes.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

afzalxo/FFGEMM
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNumerical Methods and Algorithms · VLSI and FPGA Design Techniques · Digital Filter Design and Implementation