SYCL compute kernels for ExaHyPE
Chung Ming Loi, Heinrich Bockhorst, Tobias Weinzierl

TL;DR
This paper explores three SYCL implementations of a Finite Volume scheme, comparing their performance and proposing idioms for effective realisation, highlighting that a hybrid task and data parallelism approach performs best.
Contribution
It introduces different SYCL realisation idioms for Finite Volume schemes and evaluates their performance, providing guidance for efficient implementation.
Findings
Hybrid task and data parallelism yields best performance
Nested parallelism and task graph approaches are benchmarked
SYCL-specific idioms improve implementation efficiency
Abstract
We discuss three SYCL realisations of a simple Finite Volume scheme over multiple Cartesian patches. The realisation flavours differ in the way how they map the compute steps onto loops and tasks: We compare an implementation that is exclusively using a sequence of for-loops to a version that uses nested parallelism, and finally benchmark these against a version modelling the calculations as task graph. Our work proposes realisation idioms to realise these flavours within SYCL. The results suggest that a mixture of classic task and data parallelism performs if we map this hybrid onto a solely data-parallel SYCL implementation, taking into account SYCL specifics and the problem size.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Model-Driven Software Engineering Techniques · Algorithms and Data Compression
