DiOMP-Offloading: Toward Portable Distributed Heterogeneous OpenMP
Baodi Shan, Mauricio Araya-Polo, Barbara Chapman

TL;DR
DiOMP is a portable distributed OpenMP framework that unifies heterogeneous GPU offloading and global memory management, improving scalability and programmability in high-performance computing environments.
Contribution
It introduces DiOMP, a novel framework integrating OpenMP with PGAS models for transparent, scalable distributed GPU programming.
Findings
Enhanced performance on NVIDIA A100, Grace Hopper, and AMD MI250X.
Improved scalability and programmability over traditional models.
Successful application to matrix multiplication and Minimod benchmarks.
Abstract
As core counts and heterogeneity rise in HPC, traditional hybrid programming models face challenges in managing distributed GPU memory and ensuring portability. This paper presents DiOMP, a distributed OpenMP framework that unifies OpenMP target offloading with the Partitioned Global Address Space (PGAS) model. Built atop LLVM/OpenMP and using GASNet-EX or GPI-2 for communication, DiOMP transparently handles global memory, supporting both symmetric and asymmetric GPU allocations. It leverages OMPCCL, a portable collective communication layer compatible with vendor libraries. DiOMP simplifies programming by abstracting device memory and communication, achieving superior scalability and programmability over traditional approaches. Evaluations on NVIDIA A100, Grace Hopper, and AMD MI250X show improved performance in micro-benchmarks and applications like matrix multiplication and Minimod,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIoT and Edge/Fog Computing · Modular Robots and Swarm Intelligence · Molecular Communication and Nanonetworks
