Optimizing the hybrid parallelization of BHAC
Salvatore Cielo, Oliver Porth, Luigi Iapichino, Anupam Karmakar,, Hector Olivares, Chun Xia

TL;DR
This paper details the process of modernizing the BHAC code's hybrid MPI+OpenMP parallelization, improving its efficiency and scalability on x86 and ARM architectures through profiling, optimization, and testing.
Contribution
It introduces specific optimization strategies for hybrid parallelization in BHAC, enhancing performance and scalability across different hardware architectures.
Findings
Performance improved by approximately 28%.
Enhanced scalability on hundreds of supercomputer nodes.
Successful porting of optimizations to ARM A64FX architecture.
Abstract
We present our experience with the modernization on the GR-MHD code BHAC, aimed at improving its novel hybrid (MPI+OpenMP) parallelization scheme. In doing so, we showcase the use of performance profiling tools usable on x86 (Intel-based) architectures. Our performance characterization and threading analysis provided guidance in improving the concurrency and thus the efficiency of the OpenMP parallel regions. We assess scaling and communication patterns in order to identify and alleviate MPI bottlenecks, with both runtime switches and precise code interventions. The performance of optimized version of BHAC improved by , making it viable for scaling on several hundreds of supercomputer nodes. We finally test whether porting such optimizations to different hardware is likewise beneficial on the new architecture by running on ARM A64FX vector nodes.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGamma-ray bursts and supernovae · Magnetic confinement fusion research · Pulsars and Gravitational Waves Research
