Distributed Parallelization of xPU Stencil Computations in Julia
Samuel Omlin (1), Ludovic R\"ass (2, 3), Ivan Utkin (2, 3), ((1) Swiss National Supercomputing Centre (CSCS), ETH Zurich, Lugano,, Switzerland, (2) Laboratory of Hydraulics, Hydrology, Glaciology (VAW),, ETH Zurich, Zurich, Switzerland, (3) Swiss Federal Institute for Forest

TL;DR
This paper introduces a simple method for distributed parallelization of stencil computations on GPUs using Julia, enabling efficient scaling and communication hiding for large-scale applications.
Contribution
It presents a new approach implemented in ImplicitGlobalGrid.jl for scalable distributed stencil computations on GPUs with communication hiding techniques.
Findings
Achieves close to ideal weak scaling on thousands of GPUs
Effectively hides communication costs behind computation
Demonstrates practical scalability for real-world applications
Abstract
We present a straightforward approach for distributed parallelization of stencil-based xPU applications on a regular staggered grid, which is instantiated in the package ImplicitGlobalGrid.jl. The approach allows to leverage remote direct memory access and enables close to ideal weak scaling of real-world applications on thousands of GPUs. The communication costs can be easily hidden behind computation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Advanced Data Storage Technologies · Parallel Computing and Optimization Techniques
