Intel(R) SHMEM: GPU-initiated OpenSHMEM using SYCL

Alex Brooks; Philip Marshall; David Ozog; Md. Wasi-ur- Rahman,; Lawrence Stewart; Rithwik Tom

arXiv:2409.20476·cs.DC·October 1, 2024

Intel(R) SHMEM: GPU-initiated OpenSHMEM using SYCL

Alex Brooks, Philip Marshall, David Ozog, Md. Wasi-ur- Rahman,, Lawrence Stewart, Rithwik Tom

PDF

Open Access

TL;DR

This paper introduces Intel SHMEM, a GPU-aware communication library supporting GPU-initiated operations via OpenSHMEM calls within GPU kernels, optimizing performance on heterogeneous systems.

Contribution

It presents a novel GPU-initiated communication library with OpenSHMEM support and thread-collaborative extensions for better GPU exploitation.

Findings

01

Supports GPU memory in API calls

02

Enables GPU-initiated communication within kernels

03

Adapts transfer methods for optimal performance

Abstract

Modern high-end systems are increasingly becoming heterogeneous, providing users options to use general purpose Graphics Processing Units (GPU) and other accelerators for additional performance. High Performance Computing (HPC) and Artificial Intelligence (AI) applications are often carefully arranged to overlap communications and computation for increased efficiency on such platforms. This has led to efforts to extend popular communication libraries to support GPU awareness and more recently, GPU-initiated operations. In this paper, we present Intel SHMEM, a library that enables users to write programs that are GPU aware, in that API calls support GPU memory, and also support GPU-initiated communication operations by embedding OpenSHMEM style calls within GPU kernels. We also propose thread-collaborative extensions to the OpenSHMEM standard that can enable users to better exploit the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Model Reduction and Neural Networks