Implementing Performance Portability of High Performance Computing   Programs in the New Golden Age of Chip Architecture

Weifeng Liu; Linping Wu; Xiaowen Xu; Yuren Wang

arXiv:2308.13802·cs.AR·August 29, 2023

Implementing Performance Portability of High Performance Computing Programs in the New Golden Age of Chip Architecture

Weifeng Liu, Linping Wu, Xiaowen Xu, Yuren Wang

PDF

Open Access

TL;DR

This paper reviews current techniques for achieving performance portability in high-performance computing across diverse hardware architectures, emphasizing programming models, automatic parallelization, and library use.

Contribution

It provides a comprehensive summary of existing performance portability technologies and discusses how to select suitable solutions based on application scenarios.

Findings

01

Different architectures require tailored performance portability strategies.

02

Using scientific computing libraries enhances performance portability.

03

Trade-offs exist between programming efficiency and optimization.

Abstract

As an important goal of high-performance computing, the concept of performance portability has been around for many years. As the failure of Moore's Law, it is no longer feasible to improve computer performance by simply increasing the number of existing hardware. The innovation of high performance computer is imperative, which makes high-performance computers with multiple architectures coexist in the production environment. For example, current high-performance computing nodes often use co-accelerators such like general-purpose GPUs and Intel Xeon Phis to accelerate general-purpose processors. With the flourishing of deep learning, dedicated neural network acceleration chips are also arising. The emergence of co-accelerators with different architectures and their wide application in high-performance computers have challenged the performance portability of programs between…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDistributed and Parallel Computing Systems · Cloud Computing and Resource Management · Parallel Computing and Optimization Techniques