High Level Synthesis Implementation of a Three-dimensional Systolic Array Architecture for Matrix Multiplications on Intel Stratix 10 FPGAs
Paolo Gorlani, Christian Plessl

TL;DR
This paper presents an HLS implementation of a 3D systolic array for matrix multiplication on Intel Stratix 10 FPGAs, achieving high throughput and efficient DSP utilization.
Contribution
It introduces a novel 3D systolic array architecture optimized for Stratix 10 FPGAs, enabling high-performance matrix multiplication with efficient resource use.
Findings
Achieves over 3 TFLOPS performance.
Utilizes 99% of available DSPs.
Operates at high frequencies without routing congestion.
Abstract
In this paper, we consider the HLS implementation of a three-dimensional systolic array architecture for matrix multiplication that targets specific characteristics of Intel Stratix 10 FPGAs in order to produce designs that achieve a high floating-point throughput using most of the DSPs at high frequencies in a way that avoids the congestion of the routing fabric. The investigated three-dimensional systolic array architecture is able to produce hardware designs that use 99% of the available DSPs with maximum frequencies that let us achieve performances above 3 TFLOPS.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInterconnection Networks and Systems · Parallel Computing and Optimization Techniques · Low-power high-performance VLSI design
