
TL;DR
This paper evaluates a modification to the TreePM code that groups particles to reduce computation, demonstrating significant speed improvements when combined with individual time steps and cache optimization.
Contribution
It provides the first detailed performance analysis of a particle grouping modification in TreePM codes and shows how to optimally combine it with individual time steps.
Findings
Grouping particles speeds up TreePM significantly.
Combining grouping with individual time steps doubles performance.
Cache optimization further enhances overall speed.
Abstract
We discuss the performance characteristics of using the modification of the tree code suggested by Barnes \citep{1990JCoPh..87..161B} in the context of the TreePM code. The optimisation involves identifying groups of particles and using only one tree walk to compute force for all the particles in the group. This modification has been in use in our implementation of the TreePM code for some time, and has also been used by others in codes that make use of tree structures. In this paper, we present the first detailed study of the performance characteristics of this optimisation. We show that the modification, if tuned properly can speed up the TreePM code by a significant amount. We also combine this modification with the use of individual time steps and indicate how to combine these two schemes in an optimal fashion. We find that the combination is at least a factor of two faster than the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
