Portability of Fortran's `do concurrent' on GPUs

Ronald M. Caplan; Miko M. Stulajter; Jon A. Linker; Jeff Larkin; Henry; A. Gabb; Shiquan Su; Ivan Rodriguez; Zachary Tschirhart; and Nicholas Malaya

arXiv:2408.07843·cs.PL·December 25, 2024

Portability of Fortran's `do concurrent' on GPUs

Ronald M. Caplan, Miko M. Stulajter, Jon A. Linker, Jeff Larkin, Henry, A. Gabb, Shiquan Su, Ivan Rodriguez, Zachary Tschirhart, and Nicholas Malaya

PDF

1 Repo

TL;DR

This paper examines the portability and performance of Fortran's 'do concurrent' construct for GPU acceleration across different vendor platforms, highlighting implementation details and performance results.

Contribution

It provides an analysis of the portability of Fortran's 'do concurrent' on various GPU architectures, including implementation insights and performance evaluation.

Findings

01

Successful use of 'do concurrent' on NVIDIA, Intel, and AMD GPUs.

02

Implementation details vary depending on vendor and platform.

03

Performance results demonstrate viability across multiple GPU types.

Abstract

There is a continuing interest in using standard language constructs for accelerated computing in order to avoid (sometimes vendor-specific) external APIs. For Fortran codes, the {\tt do concurrent} (DC) loop has been successfully demonstrated on the NVIDIA platform. However, support for DC on other platforms has taken longer to implement. Recently, Intel has added DC GPU offload support to its compiler, as has HPE for AMD GPUs. In this paper, we explore the current portability of using DC across GPU vendors using the in-production solar surface flux evolution code, HipFT. We discuss implementation and compilation details, including when/where using directive APIs for data movement is needed/desired compared to using a unified memory system. The performance achieved on both data center and consumer platforms is shown.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

predsci/hipft
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.