Experience Report: Writing A Portable GPU Runtime with OpenMP 5.1
Shilei Tian, Jon Chesterfield, Johannes Doerfert, Barbara, Chapman

TL;DR
This paper demonstrates that OpenMP 5.1, with minor extensions, can be used to create a portable, high-performance GPU runtime compatible with Nvidia and AMD GPUs, eliminating the need for vendor-specific SDKs.
Contribution
The work shows how to implement a performant GPU runtime using OpenMP 5.1 with minimal compiler extensions, enabling portability and ease of integration.
Findings
OpenMP 5.1 can replace CUDA-based GPU runtimes without performance loss.
A portable GPU runtime was developed using LLVM/Clang for Nvidia and AMD GPUs.
Future OpenMP versions could further improve device portability with additional compiler extensions.
Abstract
GPU runtimes are historically implemented in CUDA or other vendor specific languages dedicated to GPU programming. In this work we show that OpenMP 5.1, with minor compiler extensions, is capable of replacing existing solutions without a performance penalty. The result is a performant and portable GPU runtime that can be compiled with LLVM/Clang to Nvidia and AMD GPUs without the need for CUDA or HIP during its development and compilation. While we tried to be OpenMP compliant, we identified the need for compiler extensions to achieve the CUDA performance with our OpenMP runtime. We hope that future versions of OpenMP adopt our extensions to make device programming in OpenMP also portable across compilers, not only across execution platforms. The library we ported to OpenMP is the OpenMP device runtime that provides OpenMP functionality on the GPU. This work opens the door for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
