TEMPI: An Interposed MPI Library with a Canonical Representation of   CUDA-aware Datatypes

Carl Pearson; Kun Wu; I-Hsin Chung; Jinjun Xiong; Wen-Mei Hwu

arXiv:2012.14363·cs.DC·April 22, 2021·1 cites

TEMPI: An Interposed MPI Library with a Canonical Representation of CUDA-aware Datatypes

Carl Pearson, Kun Wu, I-Hsin Chung, Jinjun Xiong, Wen-Mei Hwu

PDF

Open Access 1 Repo

TL;DR

This paper introduces TEMPI, a novel MPI interposer library that optimizes handling of CUDA-aware non-contiguous datatypes, significantly improving MPI communication performance on GPU-enabled systems.

Contribution

It presents a new datatype handling strategy for nested strided datatypes and models non-contiguous data handling performance to transparently enhance MPI communication latency.

Findings

01

MPI_Pack speedup of up to 242000x

02

MPI_Send speedup of up to 59000x

03

More than 917x speedup in a 3D halo exchange

Abstract

MPI derived datatypes are an abstraction that simplifies handling of non-contiguous data in MPI applications. These datatypes are recursively constructed at runtime from primitive Named Types defined in the MPI standard. More recently, the development and deployment of CUDA-aware MPI implementations has encouraged the transition of distributed high-performance MPI codes to use GPUs. Such implementations allow MPI functions to directly operate on GPU buffers, easing integration of GPU compute into MPI codes. This work first presents a novel datatype handling strategy for nested strided datatypes, which finds a middle ground between the specialized or generic handling in prior work. This work also shows that the performance characteristics of non-contiguous data handling can be modeled with empirical system measurements, and used to transparently improve MPI_Send/Recv latency. Finally,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cwpearson/tempi
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Storage Technologies · Parallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems