Performance Optimisation of Smoothed Particle Hydrodynamics Algorithms   for Multi/Many-Core Architectures

Fabio Baruffa; Luigi Iapichino; Nicolay J. Hammer; Vasileios Karakasis

arXiv:1612.06090·cs.DC·September 27, 2017

Performance Optimisation of Smoothed Particle Hydrodynamics Algorithms for Multi/Many-Core Architectures

Fabio Baruffa, Luigi Iapichino, Nicolay J. Hammer, Vasileios Karakasis

PDF

TL;DR

This paper presents a set of code optimisation strategies for Smoothed Particle Hydrodynamics algorithms on multi/many-core architectures, significantly improving performance and scalability across various Intel hardware.

Contribution

The work introduces a comprehensive code modernisation approach for SPH algorithms, including threading, data layout, auto-vectorisation, and algorithmic improvements, demonstrating substantial performance gains.

Findings

01

2.6x speedup on Ivy Bridge

02

13.7x speedup on Knights Corner

03

19.1x speedup on Knights Landing

Abstract

We describe a strategy for code modernisation of Gadget, a widely used community code for computational astrophysics. The focus of this work is on node-level performance optimisation, targeting current multi/many-core IntelR architectures. We identify and isolate a sample code kernel, which is representative of a typical Smoothed Particle Hydrodynamics (SPH) algorithm. The code modifications include threading parallelism optimisation, change of the data layout into Structure of Arrays (SoA), auto-vectorisation and algorithmic improvements in the particle sorting. We obtain shorter execution time and improved threading scalability both on Intel XeonR ( $2.6 \times$ on Ivy Bridge) and Xeon PhiTM ( $13.7 \times$ on Knights Corner) systems. First few tests of the optimised code result in $19.1 \times$ faster execution on second generation Xeon Phi (Knights Landing), thus demonstrating the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.