SWIFT: Using task-based parallelism, fully asynchronous communication,   and graph partition-based domain decomposition for strong scaling on more   than 100,000 cores

Matthieu Schaller (1); Pedro Gonnet (2,3); Aidan B. G. Chalk (2),; Peter W. Draper (1) ((1) ICC; Durham University; (2) ECS; Durham University,; (3) Google Switzerland GmbH)

arXiv:1606.02738·cs.DC·August 3, 2022

SWIFT: Using task-based parallelism, fully asynchronous communication, and graph partition-based domain decomposition for strong scaling on more than 100,000 cores

Matthieu Schaller (1), Pedro Gonnet (2,3), Aidan B. G. Chalk (2),, Peter W. Draper (1) ((1) ICC, Durham University, (2) ECS, Durham University,, (3) Google Switzerland GmbH)

PDF

TL;DR

SWIFT is a new open-source cosmological simulation code that achieves excellent strong scaling on large core counts by combining task-based parallelism, graph-based domain decomposition, and asynchronous communication, without architecture-specific optimizations.

Contribution

The paper introduces SWIFT, a particle-based hydrodynamics code that employs novel task-based parallelism and dynamic domain decomposition for scalable performance on supercomputers.

Findings

01

Achieves over 60% parallel efficiency at 512-fold core increase.

02

Demonstrates strong scaling on both x86 and Power8 architectures.

03

Uses fully asynchronous communication integrated into task scheduling.

Abstract

We present a new open-source cosmological code, called SWIFT, designed to solve the equations of hydrodynamics using a particle-based approach (Smooth Particle Hydrodynamics) on hybrid shared/distributed-memory architectures. SWIFT was designed from the bottom up to provide excellent strong scaling on both commodity clusters (Tier-2 systems) and Top100-supercomputers (Tier-0 systems), without relying on architecture-specific features or specialized accelerator hardware. This performance is due to three main computational approaches: (1) Task-based parallelism for shared-memory parallelism, which provides fine-grained load balancing and thus strong scaling on large numbers of cores. (2) Graph-based domain decomposition, which uses the task graph to decompose the simulation domain such that the work, as opposed to just the data, as is the case with most partitioning schemes, is equally…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.