Towards Portability at Scale: A Cross-Architecture Performance Evaluation of a GPU-enabled Shallow Water Solver
Johansell Villalobos, Daniel Caviedes-Voulli\`eme, Silvio Rizzi, Esteban Meneses

TL;DR
This study evaluates the scalability and portability of a GPU-accelerated shallow water solver across multiple heterogeneous HPC systems, revealing high efficiency and identifying memory bandwidth as a key bottleneck, with room for further optimization.
Contribution
It provides a comprehensive performance analysis of SERGHEI-SWE across diverse architectures, demonstrating its scalability and portability with detailed metrics and insights for future improvements.
Findings
Achieves up to 32x speedup and 90% efficiency on large-scale GPU systems.
Memory bandwidth is the primary bottleneck for performance.
Portability varies with problem size, achieving under 70% with current tuning.
Abstract
Current climate change has posed a grand challenge in the field of numerical modeling due to its complex, multiscale dynamics. In hydrological modeling, the increasing demand for high-resolution, real-time simulations has led to the adoption of GPU-accelerated platforms and performance portable programming frameworks such as Kokkos. In this work, we present a comprehensive performance study of the SERGHEI-SWE solver, a shallow water equations code, across four state-of-the-art heterogeneous HPC systems: Frontier (AMD MI250X), JUWELS Booster (NVIDIA A100), JEDI (NVIDIA H100), and Aurora (Intel Max 1550). We assess strong scaling up to 1024 GPUs and weak scaling upwards of 2048 GPUs, demonstrating consistent scalability with a speedup of 32 and an efficiency upwards of 90\% for most almost all the test range. Roofline analysis reveals that memory bandwidth is the dominant performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Numerical Methods in Computational Mathematics · Numerical Methods and Algorithms
