Flexible Communication Avoiding Matrix Multiplication on FPGA with   High-Level Synthesis

Johannes de Fine Licht; Grzegorz Kwasniewski; Torsten Hoefler

arXiv:1912.06526·cs.DC·January 26, 2021

Flexible Communication Avoiding Matrix Multiplication on FPGA with High-Level Synthesis

Johannes de Fine Licht, Grzegorz Kwasniewski, Torsten Hoefler

PDF

1 Repo

TL;DR

This paper introduces a high-level synthesis approach for FPGA-based matrix multiplication that minimizes data movement and maximizes performance, supporting arbitrary data types and ensuring portability across FPGA devices.

Contribution

It presents a new model and architecture for FPGA matrix multiplication that optimizes I/O and performance using high-level synthesis, with an open-source implementation.

Findings

01

Achieves competitive performance scaling with compute and memory resources.

02

Supports arbitrary data types through high-level synthesis.

03

Provides an open-source, portable FPGA matrix multiplication solution.

Abstract

Data movement is the dominating factor affecting performance and energy in modern computing systems. Consequently, many algorithms have been developed to minimize the number of I/O operations for common computing patterns. Matrix multiplication is no exception, and lower bounds have been proven and implemented both for shared and distributed memory systems. Reconfigurable hardware platforms are a lucrative target for I/O minimizing algorithms, as they offer full control of memory accesses to the programmer. While bounds developed in the context of fixed architectures still apply to these platforms, the spatially distributed nature of their computational and memory resources requires a decentralized approach to optimize algorithms for maximum hardware utilization. We present a model to optimize matrix multiplication for FPGA platforms, simultaneously targeting maximum performance and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

spcl/gemm_hls
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.