Task-parallelism in SWIFT for heterogeneous compute architectures

Abouzied M. A. Nasar; Benedict D. Rogers; Georgios Fourtakas; Mladen Ivkovic; Tobias Weinzierl; Scott T. Kay; Matthieu Schaller

arXiv:2505.14538·cs.PF·January 22, 2026

Task-parallelism in SWIFT for heterogeneous compute architectures

Abouzied M. A. Nasar, Benedict D. Rogers, Georgios Fourtakas, Mladen Ivkovic, Tobias Weinzierl, Scott T. Kay, Matthieu Schaller

PDF

TL;DR

This paper presents GPU acceleration techniques for the SWIFT hydrodynamics solver, achieving significant speedups and energy efficiency improvements through task-parallelism on heterogeneous architectures.

Contribution

It introduces novel algorithms enabling SWIFT to leverage task-parallelism on CPUs and GPUs simultaneously, optimizing performance and reducing communication bottlenecks.

Findings

01

GPU acceleration yields up to 3.5x speedup for offloaded computations.

02

Overall simulation speed increases by 1.8x on a superchip.

03

GPU acceleration improves energy efficiency by 29%.

Abstract

This paper highlights first steps towards enabling graphics processing unit (GPU) acceleration of the task-parallel smoothed particle hydrodynamics (SPH) solver SWIFT. Novel combinations of algorithms are presented, enabling SWIFT to function as a truly heterogeneous software leveraging task-parallelism on CPUs for memory-bound computations concurrently with GPUs for compute-bound computations while minimising the effects of CPU-GPU communication latency. The proposed algorithms are validated in extensive testing. The GPU acceleration methodology is shown to deliver up to 3.5 and 7.5 speedups for the offloaded computations when including and excluding the time required to prepare and post-process data transfers on the CPU side, respectively. The overall performance of the GPU-accelerated hydrodynamic solver for a full simulation on a single Grace-Hopper superchip is 1.8 times faster…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.