Multi-Partner Project: Multi-GPU Performance Portability Analysis for CFD Simulations at Scale
Panagiotis-Eleftherios Eleftherakis (1), George Anagnostopoulos (1), Anastassis Kapetanakis (1), Mohammad Umair (2), Jean-Yves Vet (3), Konstantinos Iliakis (1), Jonathan Vincent (2), Jing Gong (2), Akshay Patil (4), Clara Garc\'ia-S\'anchez (4), Gerardo Zampino (2)

TL;DR
This paper evaluates the performance portability of a CFD simulation framework across AMD and NVIDIA GPUs, analyzing hardware, software, and application factors to optimize multi-GPU scalability and efficiency.
Contribution
It provides a comprehensive multi-level analysis of performance variability and optimization strategies for GPU-accelerated CFD simulations across different architectures.
Findings
Memory access optimizations cause 0.69× to 3.91× speedup variations.
Single-GPU performance varies significantly across architectures and compiler stacks.
Multi-GPU scalability is limited by hardware and software factors, requiring informed tuning.
Abstract
As heterogeneous supercomputing architectures leveraging GPUs become increasingly central to high-performance computing (HPC), it is crucial for computational fluid dynamics (CFD) simulations, a de-facto HPC workload, to efficiently utilize such hardware. One of the key challenges of HPC codes is performance portability, i.e. the ability to maintain near-optimal performance across different accelerators. In the context of the \textbf{REFMAP} project, which targets scalable, GPU-enabled multi-fidelity CFD for urban airflow prediction, this paper analyzes the performance portability of SOD2D, a state-of-the-art Spectral Elements simulation framework across AMD and NVIDIA GPU architectures. We first discuss the physical and numerical models underlying SOD2D, highlighting its computational hotspots. Then, we examine its performance and scalability in a multi-level manner, i.e. defining and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems · Lattice Boltzmann Simulation Studies
