Implementing implicit OpenMP data sharing on GPUs

Gheorghe-Teodor Bercea; Carlo Bertolli; Arpith C. Jacob; Alexandre; Eichenberger; Alexey Bataev; Georgios Rokos; Hyojin Sung; Tong Chen; Kevin; O'Brien

arXiv:1711.10413·cs.PL·November 29, 2017

Implementing implicit OpenMP data sharing on GPUs

Gheorghe-Teodor Bercea, Carlo Bertolli, Arpith C. Jacob, Alexandre, Eichenberger, Alexey Bataev, Georgios Rokos, Hyojin Sung, Tong Chen, Kevin, O'Brien

PDF

TL;DR

This paper redesigns OpenMP's data sharing on GPUs by mapping implicitly shared local variables to shared memory, improving memory management and compatibility with CUDA semantics.

Contribution

It introduces a new data sharing infrastructure in Clang/LLVM that maps implicit local variable sharing to GPU shared memory, addressing CUDA-OpenMP semantic differences.

Findings

01

Low shared memory usage for scalar variables (under 26%)

02

No negative impact on GPU occupancy

03

Effective control over implicit shared memory allocation

Abstract

OpenMP is a shared memory programming model which supports the offloading of target regions to accelerators such as NVIDIA GPUs. The implementation in Clang/LLVM aims to deliver a generic GPU compilation toolchain that supports both the native CUDA C/C++ and the OpenMP device offloading models. There are situations where the semantics of OpenMP and those of CUDA diverge. One such example is the policy for implicitly handling local variables. In CUDA, local variables are implicitly mapped to thread local memory and thus become private to a CUDA thread. In OpenMP, due to semantics that allow the nesting of regions executed by different numbers of threads, variables need to be implicitly \emph{shared} among the threads of a contention group. In this paper we introduce a re-design of the OpenMP device data sharing infrastructure that is responsible for the implicit sharing of local…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.