Adapting AREPO-RT for Exascale Computing: GPU Acceleration and Efficient   Communication

Oliver Zier; Rahul Kannan; Aaron Smith; Mark Vogelsberger; Erkin; Verbeek

arXiv:2404.17630·astro-ph.IM·April 30, 2024

Adapting AREPO-RT for Exascale Computing: GPU Acceleration and Efficient Communication

Oliver Zier, Rahul Kannan, Aaron Smith, Mark Vogelsberger, Erkin, Verbeek

PDF

Open Access

TL;DR

This paper enhances the AREPO-RT code for exascale supercomputers by implementing GPU acceleration and optimized communication strategies, significantly improving simulation speed and scalability for astrophysical radiative transfer modeling.

Contribution

It introduces GPU-based computation and a novel node-to-node communication method to optimize AREPO-RT for exascale architectures, enabling faster and more scalable astrophysical simulations.

Findings

01

GPU implementation yields ~15x speedup on benchmarks.

02

Communication optimizations improve performance on large and small-scale systems.

03

Overall efficiency triples in cosmological simulations of the Epoch of Reionization.

Abstract

Radiative transfer (RT) is a crucial ingredient for self-consistent modelling of numerous astrophysical phenomena across cosmic history. However, on-the-fly integration into radiation-hydrodynamics (RHD) simulations is computationally demanding, particularly due to the stringent time-stepping conditions and increased dimensionality inherent in multi-frequency collisionless Boltzmann physics. The emergence of exascale supercomputers, equipped with extensive CPU cores and GPU accelerators, offers new opportunities for enhancing RHD simulations. We present a novel optimization of AREPO-RT explicitly tailored for such high-performance computing environments. We implement a novel node-to-node communication strategy that utilizes shared memory to substitute intra-node communication with direct memory access. Furthermore, combining multiple inter-node messages into a single message…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDistributed and Parallel Computing Systems · Parallel Computing and Optimization Techniques