Evaluating performance and portability of high-level programming models: Julia, Python/Numba, and Kokkos on exascale nodes
William F. Godoy, Pedro Valero-Lara, T. Elise Dettling, Christian, Trefftz, Ian Jorquera, Thomas Sheehy, Ross G. Miller, Marc Gonzalez-Tallada,, Jeffrey S. Vetter, Valentin Churavy

TL;DR
This paper evaluates the performance and portability of high-level programming models Julia, Python/Numba, and Kokkos on exascale HPC nodes, comparing them to traditional implementations across CPUs and GPUs.
Contribution
It provides a comparative analysis of high-level programming models' performance and portability on diverse exascale hardware, highlighting strengths and limitations.
Findings
Julia and Kokkos perform comparably with C/OpenMP on CPUs.
Julia implementations are competitive with CUDA and HIP on GPUs.
Performance gaps exist for Julia's single precision and Kokkos on NVIDIA A100 GPUs.
Abstract
We explore the performance and portability of the high-level programming models: the LLVM-based Julia and Python/Numba, and Kokkos on high-performance computing (HPC) nodes: AMD Epyc CPUs and MI250X graphical processing units (GPUs) on Frontier's test bed Crusher system and Ampere's Arm-based CPUs and NVIDIA's A100 GPUs on the Wombat system at the Oak Ridge Leadership Computing Facilities. We compare the default performance of a hand-rolled dense matrix multiplication algorithm on CPUs against vendor-compiled C/OpenMP implementations, and on each GPU against CUDA and HIP. Rather than focusing on the kernel optimization per-se, we select this naive approach to resemble exploratory work in science and as a lower-bound for performance to isolate the effect of each programming model. Julia and Kokkos perform comparably with C/OpenMP on CPUs, while Julia implementations are competitive with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems · Advanced Data Storage Technologies
