Implementing Multi-GPU Scientific Computing Miniapps Across Performance Portable Frameworks
Johansell Villalobos, Josef Ruzicka, Silvio Rizzi

TL;DR
This paper compares the performance of four performance portability frameworks—Kokkos, OpenMP, RAJA, and OCCA—in multi-GPU scientific applications, revealing variability and areas for optimization on exascale hardware.
Contribution
It provides preliminary performance insights into how different frameworks perform for scientific applications on multi-GPU systems, highlighting strengths and limitations of each.
Findings
OCCA shows faster execution for small problems due to JIT compilation.
OpenMP performs poorly in inter-node communication scenarios.
Framework performance varies significantly, indicating need for further optimization.
Abstract
Scientific computing in the exascale era demands increased computational power to solve complex problems across various domains. With the rise of heterogeneous computing architectures the need for vendor-agnostic, performance portability frameworks has been highlighted. Libraries like Kokkos have become essential for enabling high-performance computing applications to execute efficiently across different hardware platforms with minimal code changes. In this direction, this paper presents preliminary time-to-solution results for two representative scientific computing applications: an N-body simulation and a structured grid simulation. Both applications used a distributed memory approach and hardware acceleration through four performance portability frameworks: Kokkos, OpenMP, RAJA, and OCCA. Experiments conducted on a single node of the Polaris supercomputer using four NVIDIA A100 GPUs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems · Advanced Data Storage Technologies
