A Parallel TreePM Code
Suryadeep Ray, J.S. Bagla

TL;DR
This paper introduces a parallel TreePM algorithm combining functional and domain decompositions, achieving significant speedups for large-scale cosmological simulations on Linux clusters.
Contribution
The paper presents a novel parallelization approach for TreePM codes using combined functional and domain decompositions, improving efficiency for large particle simulations.
Findings
Speedup of 31.4 with 128^3 particles on 33 processors
Time per step is 6.5 microseconds for 256^3 particles on 65 processors
Simulation of 4000 steps takes 5 days on the tested cluster
Abstract
We present an algorithm for parallelising the TreePM code. We use both functional and domain decompositions. Functional decomposition is used to separate the computation of long range and short range forces, as well as the task of coordinating communications between different components. Short range force calculation is time consuming and benefits from the use of domain decomposition. We have tested the code on a Linux cluster. We get a speedup of 31.4 for 128^3 particle simulation on 33 processors; speedup being better for larger simulations. The time taken for one time step per particle is 6.5 micro seconds for a 256^3 particle simulation on 65 processors, thus a simulation that runs for 4000 time steps takes 5 days on this cluster.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParticle accelerators and beam dynamics · Superconducting Materials and Applications · Magnetic confinement fusion research
