Accelerating Communication for Parallel Programming Models on GPU   Systems

Jaemin Choi; Zane Fink; Sam White; Nitin Bhat; David F. Richards,; Laxmikant V. Kale

arXiv:2102.12416·cs.DC·March 23, 2022

Accelerating Communication for Parallel Programming Models on GPU Systems

Jaemin Choi, Zane Fink, Sam White, Nitin Bhat, David F. Richards,, Laxmikant V. Kale

PDF

Open Access

TL;DR

This paper demonstrates how the UCX framework can be used to accelerate GPU-aware communication in parallel programming models, significantly improving latency and bandwidth for microbenchmarks and real-world applications on GPU systems.

Contribution

The work presents a unified GPU-aware communication layer for Charm++ ecosystem models using UCX, achieving substantial performance improvements.

Findings

01

Latency improvements up to 17.4x across models

02

Bandwidth increases up to 10.5x

03

Enhanced application performance by up to 19.7x

Abstract

As an increasing number of leadership-class systems embrace GPU accelerators in the race towards exascale, efficient communication of GPU data is becoming one of the most critical components of high-performance computing. For developers of parallel programming models, implementing support for GPU-aware communication using native APIs for GPUs such as CUDA can be a daunting task as it requires considerable effort with little guarantee of performance. In this work, we demonstrate the capability of the Unified Communication X (UCX) framework to compose a GPU-aware communication layer that serves multiple parallel programming models of the Charm++ ecosystem: Charm++, Adaptive MPI (AMPI), and Charm4py. We demonstrate the performance impact of our designs with microbenchmarks adapted from the OSU benchmark suite, obtaining improvements in latency of up to 10.1x in Charm++, 11.7x in AMPI, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Cloud Computing and Resource Management · Caching and Content Delivery