Efficient MPI-based Communication for GPU-Accelerated Dask Applications

Aamir Shafi; Jahanzeb Maqbool Hashmi; Hari Subramoni; Dhabaleswar; K. Panda

arXiv:2101.08878·cs.DC·January 25, 2021

Efficient MPI-based Communication for GPU-Accelerated Dask Applications

Aamir Shafi, Jahanzeb Maqbool Hashmi, Hari Subramoni, Dhabaleswar, K. Panda

PDF

1 Repo

TL;DR

This paper introduces MPI4Dask, a GPU-aware MPI communication backend for Dask that significantly improves latency and throughput on HPC clusters with GPUs, outperforming existing UCX-based solutions.

Contribution

The paper presents MPI4Dask, a novel MPI-based communication backend for Dask that leverages mpi4py over MVAPICH2-GDR, enabling faster GPU-accelerated distributed computing.

Findings

01

MPI4Dask outperforms UCX by 6x for 1-byte messages and 4x for large messages.

02

MPI4Dask accelerates two benchmark applications by over 3x on a GPU cluster.

03

MPI4Dask scales efficiently up to 32 workers on a GPU system.

Abstract

Dask is a popular parallel and distributed computing framework, which rivals Apache Spark to enable task-based scalable processing of big data. The Dask Distributed library forms the basis of this computing engine and provides support for adding new communication devices. It currently has two communication devices: one for TCP and the other for high-speed networks using UCX-Py -- a Cython wrapper to UCX. This paper presents the design and implementation of a new communication backend for Dask -- called MPI4Dask -- that is targeted for modern HPC clusters built with GPUs. MPI4Dask exploits mpi4py over MVAPICH2-GDR, which is a GPU-aware implementation of the Message Passing Interface (MPI) standard. MPI4Dask provides point-to-point asynchronous I/O communication coroutines, which are non-blocking concurrent operations defined using the async/await keywords from the Python's asyncio…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aamirshafi/distributed
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.