Closing the Performance Gap with Modern C++
Thomas Heller, Hartmut Kaiser, Patrick Diehl, Dietmar Fey and, Marc Alexander Schweitzer

TL;DR
This paper introduces a high-level C++ abstraction for parallelism that supports diverse hardware architectures, enabling portable code and comparable performance across systems like GPUs, SIMD units, and CPUs.
Contribution
The paper presents a novel high-level C++ programming abstraction for parallelism that achieves performance portability across heterogeneous architectures.
Findings
Performance comparable to native benchmarks
Supports a wide range of hardware architectures
Provides a uniform programming API
Abstract
On the way to Exascale, programmers face the increasing challenge of having to support multiple hardware architectures from the same code base. At the same time, portability of code and performance are increasingly difficult to achieve as hardware architectures are becoming more and more diverse. Today's heterogeneous systems often include two or more completely distinct and incompatible hardware execution models, such as GPGPU's, SIMD vector units, and general purpose cores which conventionally have to be programmed using separate tool chains representing non-overlapping programming models. The recent revival of interest in the industry and the wider community for the C++ language has spurred a remarkable amount of standardization proposals and technical specifications in the arena of concurrency and parallelism. This recently includes an increasing amount of discussion around the need…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
