Integration of CUDA Processing within the C++ library for parallelism   and concurrency (HPX)

Patrick Diehl; Madhavan Seshadri; Thomas Heller; Hartmut; Kaiser

arXiv:1810.11482·cs.DC·March 7, 2019

Integration of CUDA Processing within the C++ library for parallelism and concurrency (HPX)

Patrick Diehl, Madhavan Seshadri, Thomas Heller, Hartmut, Kaiser

PDF

1 Repo

TL;DR

This paper extends the HPX C++ runtime system to seamlessly integrate CUDA GPU processing, enabling asynchronous data transfers and kernel launches across distributed systems for improved resource utilization.

Contribution

It introduces a novel integration of CUDA within HPX, allowing asynchronous GPU operations to be managed within the HPX execution model for distributed applications.

Findings

01

Asynchronous GPU data transfers and kernel launches are effectively integrated into HPX.

02

The approach enables full utilization of local and remote GPUs in distributed systems.

03

Overhead measurements show no additional computational cost from integration.

Abstract

Experience shows that on today's high performance systems the utilization of different acceleration cards in conjunction with a high utilization of all other parts of the system is difficult. Future architectures, like exascale clusters, are expected to aggravate this issue as the number of cores are expected to increase and memory hierarchies are expected to become deeper. One big aspect for distributed applications is to guarantee high utilization of all available resources, including local or remote acceleration cards on a cluster while fully using all the available CPU resources and the integration of the GPU work into the overall programming model. For the integration of CUDA code we extended HPX, a general purpose C++ run time system for parallel and distributed applications of any scale, and enabled asynchronous data transfers from and to the GPU device and the asynchronous…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

STEllAR-GROUP/hpxcl
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.