When HLS Meets FPGA HBM: Benchmarking and Bandwidth Optimization
Young-kyu Choi, Yuze Chi, Jie Wang, Licheng Guo, Jason Cong

TL;DR
This paper benchmarks HBM FPGA boards, analyzes HLS limitations, and proposes optimization techniques that significantly enhance effective bandwidth for memory-bound applications.
Contribution
It introduces HLS-based optimization methods to better utilize HBM bandwidth on FPGA boards, addressing existing access limitations.
Findings
Effective bandwidth improves by 2.4X-3.8X with proposed techniques.
Performance analysis of three HBM FPGA boards using microbenchmarks.
Insights for future HBM FPGA HLS design improvements.
Abstract
With the recent release of High Bandwidth Memory (HBM) based FPGA boards, developers can now exploit unprecedented external memory bandwidth. This allows more memory-bounded applications to benefit from FPGA acceleration. However, we found that it is not easy to fully utilize the available bandwidth when developing some applications with high-level synthesis (HLS) tools. This is due to the limitation of existing HLS tools when accessing HBM board's large number of independent external memory channels. In this paper, we measure the performance of three recent representative HBM FPGA boards (Intel's Stratix 10 MX and Xilinx's Alveo U50/U280 boards) with microbenchmarks and analyze the HLS overhead. Next, we propose HLS-based optimization techniques to improve the effective bandwidth when a PE accesses multiple HBM channels or multiple PEs access an HBM channel. Our experiment demonstrates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInterconnection Networks and Systems · Embedded Systems Design Techniques · Parallel Computing and Optimization Techniques
