Performance Assessment of OpenMP Compilers Targeting NVIDIA V100 GPUs

Joshua Hoke Davis; Christopher Daley; Swaroop Pophale; Thomas Huber,; Sunita Chandrasekaran; Nicholas J. Wright

arXiv:2010.09454·cs.PF·December 4, 2020

Performance Assessment of OpenMP Compilers Targeting NVIDIA V100 GPUs

Joshua Hoke Davis, Christopher Daley, Swaroop Pophale, Thomas Huber,, Sunita Chandrasekaran, Nicholas J. Wright

PDF

TL;DR

This paper evaluates the performance of OpenMP offloading compilers on NVIDIA V100 GPUs across five proxy applications, revealing significant variability and providing best practices for application developers.

Contribution

It demonstrates the performance differences among OpenMP compilers on GPUs and offers restructuring strategies to improve application speedups.

Findings

01

Up to 18x speedup with Clang on su3 application

02

Up to 15.7x speedup with Cray-llvm on laplace mini-app

03

Performance varies widely across different OpenMP compilers

Abstract

Heterogeneous systems are becoming increasingly prevalent. In order to exploit the rich compute resources of such systems, robust programming models are needed for application developers to seamlessly migrate legacy code from today's systems to tomorrow's. Over the past decade and more, directives have been established as one of the promising paths to tackle programmatic challenges on emerging systems. This work focuses on applying and demonstrating OpenMP offloading directives on five proxy applications. We observe that the performance varies widely from one compiler to the other; a crucial aspect of our work is reporting best practices to application developers who use OpenMP offloading compilers. While some issues can be worked around by the developer, there are other issues that must be reported to the compiler vendors. By restructuring OpenMP offloading directives, we gain an 18x…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.