TL;DR
targetDP is a lightweight programming layer that abstracts data parallelism for structured grid applications, enabling portable high-performance code across CPUs and GPUs with minimal modifications.
Contribution
It introduces targetDP, a portable abstraction layer for data parallelism that supports both CPU and GPU architectures using standard C macros and functions.
Findings
Demonstrates performance portability across CPU and GPU platforms.
Shows optimization benefits from exposing instruction-level parallelism.
Provides incremental integration with existing applications.
Abstract
To achieve high performance on modern computers, it is vital to map algorithmic parallelism to that inherent in the hardware. From an application developer's perspective, it is also important that code can be maintained in a portable manner across a range of hardware. Here we present targetDP (target Data Parallel), a lightweight programming layer that allows the abstraction of data parallelism for applications that employ structured grids. A single source code may be used to target both thread level parallelism (TLP) and instruction level parallelism (ILP) on either SIMD multi-core CPUs or GPU-accelerated platforms. targetDP is implemented via standard C preprocessor macros and library functions, can be added to existing applications incrementally, and can be combined with higher-level paradigms such as MPI. We present CPU and GPU performance results for a benchmark taken from the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
