Multi-GPU Hybrid Particle-in-Cell Monte Carlo Simulations for Exascale Computing Systems
Jeremy J. Williams, Jordy Trilaksono, Stefan Costea, Yi Ju, Luca Pennati, Jonah Ekelund, David Tskhakaya, Leon Kos, Ales Podolnik, Jakub Hromadka, Allen D. Malony, Sameer Shende, Tilman Dannert, Frank Jenko, Erwin Laure, Stefano Markidis

TL;DR
This paper introduces a portable multi-GPU implementation of PIC Monte Carlo simulations that efficiently scales on heterogeneous exascale systems, reducing data movement and synchronization overheads.
Contribution
A novel MPI+OpenMP hybrid approach enabling scalable, portable PIC MC simulations across Nvidia and AMD GPUs with optimized data transfer and I/O strategies.
Findings
Achieved scalable performance on up to 16,000 GPUs on exascale systems.
Demonstrated significant improvements in run time and resource utilization.
Enabled efficient large-scale plasma physics simulations.
Abstract
Particle-in-Cell (PIC) Monte Carlo (MC) simulations are central to plasma physics but face increasing challenges on heterogeneous HPC systems due to excessive data movement, synchronization overheads, and inefficient utilization of multiple accelerators. In this work, we present a portable, multi-GPU hybrid MPI+OpenMP implementation of BIT1 that enables scalable execution on both Nvidia and AMD accelerators through OpenMP target tasks with explicit dependencies to overlap computation and communication across devices. Portability is achieved through persistent device-resident memory, an optimized contiguous one-dimensional data layout, and a transition from unified to pinned host memory to improve large data-transfer efficiency, together with GPU Direct Memory Access (DMA) and runtime interoperability for direct device-pointer access. Standardized and scalable I/O is provided using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
