Scalable communication for high-order stencil computations using   CUDA-aware MPI

Johannes Pekkil\"a; Miikka S. V\"ais\"al\"a; Maarit J. K\"apyl\"a,; Matthias Rheinhardt; Oskar Lappi

arXiv:2103.01597·cs.DC·May 11, 2022

Scalable communication for high-order stencil computations using CUDA-aware MPI

Johannes Pekkil\"a, Miikka S. V\"ais\"al\"a, Maarit J. K\"apyl\"a,, Matthias Rheinhardt, Oskar Lappi

PDF

1 Repo

TL;DR

This paper presents a CUDA-aware MPI communication scheme for high-order stencil computations, significantly improving scalability and efficiency in GPU-based magnetohydrodynamics simulations.

Contribution

It introduces a generic GPU communication scheme using CUDA-aware MPI that enhances intra-node locality and scales efficiently across multiple GPUs for high-order stencil computations.

Findings

01

Strong scaling from 1 to 64 GPUs at 50-87% efficiency

02

20-60x speedup over CPU solvers

03

9-12x energy efficiency improvement on 16 nodes

Abstract

Modern compute nodes in high-performance computing provide a tremendous level of parallelism and processing power. However, as arithmetic performance has been observed to increase at a faster rate relative to memory and network bandwidths, optimizing data movement has become critical for achieving strong scaling in many communication-heavy applications. This performance gap has been further accentuated with the introduction of graphics processing units, which can provide by multiple factors higher throughput in data-parallel tasks than central processing units. In this work, we explore the computational aspects of iterative stencil loops and implement a generic communication scheme using CUDA-aware MPI, which we use to accelerate magnetohydrodynamics simulations based on high-order finite differences and third-order Runge-Kutta integration. We put particular focus on improving…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://bitbucket.org/jpekkila/pekkila-2021-artifacts
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.