Productivity meets Performance: Julia on A64FX
Mos\`e Giordano, Milan Kl\"ower, Valentin Churavy

TL;DR
This paper evaluates Julia's performance on the A64FX supercomputer processor, demonstrating its efficiency in reduced precision computations, MPI scalability, and complex application execution, highlighting its productivity and performance benefits.
Contribution
It provides the first comprehensive performance analysis of Julia on A64FX, including reduced precision, MPI scalability, and complex application support.
Findings
Julia matches tuned library performance in axpy benchmarks.
MPI overheads in Julia are negligible on Fugaku.
Julia effectively supports various floating-point precisions in complex models.
Abstract
The Fujitsu A64FX ARM-based processor is used in supercomputers such as Fugaku in Japan and Isambard 2 in the UK and provides an interesting combination of hardware features such as Scalable Vector Extension (SVE), and native support for reduced-precision floating-point arithmetic. The goal of this paper is to explore performance of the Julia programming language on the A64FX processor, with a particular focus on reduced precision. Here, we present a performance study on axpy to verify the compilation pipeline, demonstrating that Julia can match the performance of tuned libraries. Additionally, we investigate Message Passing Interface (MPI) scalability and throughput analysis on Fugaku showing next to no significant overheads of Julia of its MPI interface. To explore the usability of Julia to target various floating-point precisions, we present results of ShallowWaters.jl, a shallow…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Numerical Methods and Algorithms · Distributed and Parallel Computing Systems
