Task-parallelism in SWIFT for heterogeneous compute architectures
Abouzied M. A. Nasar, Benedict D. Rogers, Georgios Fourtakas, Mladen Ivkovic, Tobias Weinzierl, Scott T. Kay, Matthieu Schaller

TL;DR
This paper presents GPU acceleration techniques for the SWIFT hydrodynamics solver, achieving significant speedups and energy efficiency improvements through task-parallelism on heterogeneous architectures.
Contribution
It introduces novel algorithms enabling SWIFT to leverage task-parallelism on CPUs and GPUs simultaneously, optimizing performance and reducing communication bottlenecks.
Findings
GPU acceleration yields up to 3.5x speedup for offloaded computations.
Overall simulation speed increases by 1.8x on a superchip.
GPU acceleration improves energy efficiency by 29%.
Abstract
This paper highlights first steps towards enabling graphics processing unit (GPU) acceleration of the task-parallel smoothed particle hydrodynamics (SPH) solver SWIFT. Novel combinations of algorithms are presented, enabling SWIFT to function as a truly heterogeneous software leveraging task-parallelism on CPUs for memory-bound computations concurrently with GPUs for compute-bound computations while minimising the effects of CPU-GPU communication latency. The proposed algorithms are validated in extensive testing. The GPU acceleration methodology is shown to deliver up to 3.5 and 7.5 speedups for the offloaded computations when including and excluding the time required to prepare and post-process data transfers on the CPU side, respectively. The overall performance of the GPU-accelerated hydrodynamic solver for a full simulation on a single Grace-Hopper superchip is 1.8 times faster…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
