The Memory Controller Wall: Benchmarking the Intel FPGA SDK for OpenCL Memory Interface
Hamid Reza Zohouri, Satoshi Matsuoka

TL;DR
This paper benchmarks and analyzes the memory controller behavior of Intel FPGA SDK for OpenCL, revealing shortcomings that limit memory performance and proposing workarounds to enhance efficiency.
Contribution
It provides a comprehensive analysis of Intel FPGA's memory interface, identifying key limitations and suggesting practical workarounds for improved memory bandwidth utilization.
Findings
Memory access alignment issues hinder performance
Workarounds can improve memory bandwidth efficiency
Major controller redesign needed for optimal performance
Abstract
Supported by their high power efficiency and recent advancements in High Level Synthesis (HLS), FPGAs are quickly finding their way into HPC and cloud systems. Large amounts of work have been done so far on loop and area optimizations for different applications on FPGAs using HLS. However, a comprehensive analysis of the behavior and efficiency of the memory controller of FPGAs is missing in literature, which becomes even more crucial when the limited memory bandwidth of modern FPGAs compared to their GPU counterparts is taken into account. In this work, we will analyze the memory interface generated by Intel FPGA SDK for OpenCL with different configurations for input/output arrays, vector size, interleaving, kernel programming model, on-chip channels, operating frequency, padding, and multiple types of overlapped blocking. Our results point to multiple shortcomings in the memory…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
