NERO: A Near High-Bandwidth Memory Stencil Accelerator for Weather Prediction Modeling
Gagandeep Singh, Dionysios Diamantopoulos, Christoph Hagleitner, Juan, Gomez-Luna, Sander Stuijk, Onur Mutlu, Henk Corporaal

TL;DR
NERO is an FPGA+HBM-based near-memory accelerator that significantly improves performance and energy efficiency for weather prediction kernels compared to traditional CPU systems.
Contribution
This paper introduces NERO, a novel FPGA-based accelerator with high-bandwidth memory designed specifically for weather prediction stencil computations.
Findings
NERO outperforms POWER9 by 4.2x and 8.3x in speed.
NERO reduces energy consumption by over 20x.
Achieves high energy efficiency of up to 17.3 GFLOPS/Watt.
Abstract
Ongoing climate change calls for fast and accurate weather and climate modeling. However, when solving large-scale weather prediction simulations, state-of-the-art CPU and GPU implementations suffer from limited performance and high energy consumption. These implementations are dominated by complex irregular memory access patterns and low arithmetic intensity that pose fundamental challenges to acceleration. To overcome these challenges, we propose and evaluate the use of near-memory acceleration using a reconfigurable fabric with high-bandwidth memory (HBM). We focus on compound stencils that are fundamental kernels in weather prediction models. By using high-level synthesis techniques, we develop NERO, an FPGA+HBM-based accelerator connected through IBM CAPI2 (Coherent Accelerator Processor Interface) to an IBM POWER9 host system. Our experimental results show that NERO outperforms a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
