Exploring the acceleration of Nekbone on reconfigurable architectures
Nick Brown

TL;DR
This paper demonstrates how redesigning the Nekbone HPC mini-app for FPGAs using high-level synthesis can significantly improve performance and power efficiency compared to CPUs and GPUs, highlighting the potential of reconfigurable architectures.
Contribution
It presents a novel FPGA implementation of Nekbone's AX kernel with optimization strategies, achieving high performance and power efficiency, and compares it against CPU and GPU benchmarks.
Findings
FPGA outperforms CPU by 4x in performance
FPGA achieves nearly 75% of GPU performance
Significant power efficiency gains on FPGA
Abstract
Hardware technological advances are struggling to match scientific ambition, and a key question is how we can use the transistors that we already have more effectively. This is especially true for HPC, where the tendency is often to throw computation at a problem whereas codes themselves are commonly bound, at-least to some extent, by other factors. By redesigning an algorithm and moving from a Von Neumann to dataflow style, then potentially there is more opportunity to address these bottlenecks on reconfigurable architectures, compared to more general-purpose architectures. In this paper we explore the porting of Nekbone's AX kernel, a widely popular HPC mini-app, to FPGAs using High Level Synthesis via Vitis. Whilst computation is an important part of this code, it is also memory bound on CPUs, and a key question is whether one can ameliorate this by leveraging FPGAs. We first…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
