JACC: An OpenACC Runtime Framework with Kernel-Level and Multi-GPU Parallelization
Kazuaki Matsumura, Simon Garcia De Gonzalo, Antonio J. Pe\~na

TL;DR
JACC is a runtime framework that enhances OpenACC for multi-GPU systems, enabling automatic code distribution and near-linear scaling, thus improving performance without extensive code modifications.
Contribution
It introduces a dynamic, transparent OpenACC runtime layer that facilitates multi-GPU utilization and automatic code distribution, addressing limitations of compiler directives.
Findings
Nearly linear scaling with NVIDIA V100 GPUs
Automatic multi-GPU code distribution achieved
Performance improvements offset GPU communication latency
Abstract
The rapid development in computing technology has paved the way for directive-based programming models towards a principal role in maintaining software portability of performance-critical applications. Efforts on such models involve a least engineering cost for enabling computational acceleration on multiple architectures while programmers are only required to add meta information upon sequential code. Optimizations for obtaining the best possible efficiency, however, are often challenging. The insertions of directives by the programmer can lead to side-effects that limit the available compiler optimization possible, which could result in performance degradation. This is exacerbated when targeting multi-GPU systems, as pragmas do not automatically adapt to such systems, and require expensive and time consuming code adjustment by programmers. This paper introduces JACC, an OpenACC…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
