Julia as a unifying end-to-end workflow language on the Frontier   exascale system

William F. Godoy; Pedro Valero-Lara; Caira Anderson; Katrina W. Lee,; Ana Gainaru; Rafael Ferreira da Silva; Jeffrey S. Vetter

arXiv:2309.10292·cs.DC·September 29, 2023

Julia as a unifying end-to-end workflow language on the Frontier exascale system

William F. Godoy, Pedro Valero-Lara, Caira Anderson, Katrina W. Lee,, Ana Gainaru, Rafael Ferreira da Silva, Jeffrey S. Vetter

PDF

1 Repo

TL;DR

This paper evaluates Julia as a unified language for high-performance computing workflows on the Frontier exascale system, demonstrating its capabilities and limitations in GPU computing, scaling, and I/O performance.

Contribution

It demonstrates Julia's viability as a high-performance, end-to-end workflow language on exascale systems, highlighting performance trade-offs and integration with HPC components.

Findings

01

Julia achieves near-zero overhead for MPI and I/O operations.

02

Performance on GPUs is about 50% slower than native HIP codes.

03

Julia scales effectively up to 4,096 GPUs/ MPI processes.

Abstract

We evaluate Julia as a single language and ecosystem paradigm powered by LLVM to develop workflow components for high-performance computing. We run a Gray-Scott, 2-variable diffusion-reaction application using a memory-bound, 7-point stencil kernel on Frontier, the US Department of Energy's first exascale supercomputer. We evaluate the performance, scaling, and trade-offs of (i) the computational kernel on AMD's MI250x GPUs, (ii) weak scaling up to 4,096 MPI processes/GPUs or 512 nodes, (iii) parallel I/O writes using the ADIOS2 library bindings, and (iv) Jupyter Notebooks for interactive analysis. Results suggest that although Julia generates a reasonable LLVM-IR, a nearly 50% performance difference exists vs. native AMD HIP stencil codes when running on the GPUs. As expected, we observed near-zero overhead when using MPI and parallel I/O bindings for system-wide installed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

juliaornl/grayscott.jl
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.