Evaluating the performance portability of SYCL across CPUs and GPUs on bandwidth-bound applications
Istvan Z Reguly

TL;DR
This paper assesses SYCL's ability to deliver performance portability across diverse CPUs and GPUs for bandwidth-bound applications, revealing strengths on GPUs and areas for improvement on CPUs.
Contribution
It provides a comprehensive evaluation of SYCL's performance portability across multiple hardware vendors and compilers, highlighting current capabilities and limitations.
Findings
SYCL slightly outperforms native approaches on GPUs.
On CPUs, SYCL performance lags behind native implementations.
SYCL offers a unified programming model for most HPC architectures.
Abstract
In this paper, we evaluate the portability of the SYCL programming model on some of the latest CPUs and GPUs from a wide range of vendors, utilizing the two main compilers: DPC++ and hipSYCL/OpenSYCL. Both compilers currently support GPUs from all three major vendors; we evaluate performance on the Intel(R) Data Center GPU Max 1100, the NVIDIA A100 GPU, and the AMD MI250X GPU. Support on CPUs currently is less established, with DPC++ only supporting x86 CPUs through OpenCL, however, OpenSYCL does have an OpenMP backend capable of targeting all modern CPUs; we benchmark the Intel Xeon Platinum 8360Y Processor (Ice Lake), the AMD EPYC 9V33X (Genoa-X), and the Ampere Altra platforms. We study a range of primarily bandwidth-bound applications implemented using the OPS and OP2 DSLs, evaluate different formulations in SYCL, and contrast their performance to "native" programming approaches…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Distributed systems and fault tolerance
