Benchmarking High Bandwidth Memory on FPGAs
Zeke Wang, Hongjing Huang, Jie Zhang, Gustavo Alonso

TL;DR
This paper benchmarks High Bandwidth Memory (HBM) on FPGAs using a custom tool, revealing its performance potential and impact of usage patterns, and compares it with DDR4 to guide optimal memory selection.
Contribution
It introduces Shuhai, a benchmarking tool for HBM on FPGAs, providing detailed, accurate performance insights and enabling comparisons with other memory types.
Findings
HBM achieves up to 425GB/s bandwidth
Memory usage patterns significantly affect performance
Shuhai enables precise, deterministic benchmarking
Abstract
FPGAs are starting to be enhanced with High Bandwidth Memory (HBM) as a way to reduce the memory bandwidth bottleneck encountered in some applications and to give the FPGA more capacity to deal with application state. However, the performance characteristics of HBM are still not well specified, especially in the context of FPGAs. In this paper, we bridge the gap between nominal specifications and actual performance by benchmarkingHBM on a state-of-the-art FPGA, i.e., a Xilinx Alveo U280 featuring a two-stack HBM subsystem. To this end, we propose Shuhai, a benchmarking tool that allows us to demystify all the underlying details of HBM on an FPGA. FPGA-based benchmarking should also provide a more accurate picture of HBM than doing so on CPUs/GPUs, since CPUs/GPUs are noisier systems due to their complex control logic and cache hierarchy. Since the memory itself is complex, leveraging…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Network Packet Processing and Optimization
