GreeM : Massively Parallel TreePM Code for Large Cosmological N-body Simulations
Tomoaki Ishiyama, Toshiyuki Fukushige, Junichiro Makino

TL;DR
GreeM is a highly efficient, massively parallel TreePM code designed for large cosmological N-body simulations, demonstrating excellent load balancing and scalability on supercomputers like Cray XT4.
Contribution
This paper introduces GreeM, a novel parallel TreePM code with recursive domain decomposition and optimized load balancing for large-scale cosmological simulations.
Findings
Achieves 5 imes 10^4 particles/sec per CPU core on Cray XT4
Maintains only 4% performance loss due to load imbalance with over 1000 cores
Efficiently scales on PC clusters and supercomputers
Abstract
In this paper, we describe the implementation and performance of GreeM, a massively parallel TreePM code for large-scale cosmological N-body simulations. GreeM uses a recursive multi-section algorithm for domain decomposition. The size of the domains are adjusted so that the total calculation time of the force becomes the same for all processes. The loss of performance due to non-optimal load balancing is around 4%, even for more than 10^3 CPU cores. GreeM runs efficiently on PC clusters and massively-parallel computers such as a Cray XT4. The measured calculation speed on Cray XT4 is 5 \times 10^4 particles per second per CPU core, for the case of an opening angle of \theta=0.5, if the number of particles per CPU core is larger than 10^6.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
