FSHMEM: Supporting Partitioned Global Address Space on FPGAs for Large-Scale Hardware Acceleration Infrastructure
Yashael Faith Arthanto, David Ojika, Joo-Young Kim

TL;DR
This paper introduces FSHMEM, a framework enabling PGAS programming on FPGAs, achieving high bandwidth and low latency, and demonstrating significant speedups in HPC applications with FPGA-based accelerators.
Contribution
It presents the first FPGA implementation of PGAS core functions, integrating GASNet specification, with high performance and compatibility for HPC systems.
Findings
Peak bandwidth of 3813 MB/s, over 95% of theoretical maximum.
Remote write/read latency of 0.35us/0.59us.
Speedup of nearly 2x in matrix multiplication and convolution.
Abstract
By providing highly efficient one-sided communication with globally shared memory space, Partitioned Global Address Space (PGAS) has become one of the most promising parallel computing models in high-performance computing (HPC). Meanwhile, FPGA is getting attention as an alternative compute platform for HPC systems with the benefit of custom computing and design flexibility. However, the exploration of PGAS has not been conducted on FPGAs, unlike the traditional message passing interface. This paper proposes FSHMEM, a software/hardware framework that enables the PGAS programming model on FPGAs. We implement the core functions of GASNet specification on FPGA for native PGAS integration in hardware, while its programming interface is designed to be highly compatible with legacy software. Our experiments show that FSHMEM achieves the peak bandwidth of 3813 MB/s, which is more than 95% of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Interconnection Networks and Systems · Distributed and Parallel Computing Systems
