High-Performance Small-Scale Simulation of Star Clusters Evolution on Cray XD1
Keigo Nitadori, Junichiro Makino, George Abe

TL;DR
This paper demonstrates a highly efficient small-scale N-body simulation of star cluster evolution on a Cray XD1 system, achieving over 57% of peak performance with 64k particles, which is significantly higher than previous results.
Contribution
The paper presents a novel high-performance implementation of N-body simulation for small particle counts on a Cray XD1, utilizing a scalable 2D parallelization scheme and low-latency communication.
Findings
Achieved 2.03 Tflops, 57.7% of peak performance.
Surpassed previous efficiency records for small N-body simulations.
Demonstrated the importance of communication network in parallel performance.
Abstract
In this paper, we describe the performance of an -body simulation of star cluster with 64k stars on a Cray XD1 system with 400 dual-core Opteron processors. A number of astrophysical -body simulations were reported in SCxy conferences. All previous entries for Gordon-Bell prizes used at least 700k particles. The reason for this preference of large numbers of particles is the parallel efficiency. It is very difficult to achieve high performance on large parallel machines, if the number of particles is small. However, for many scientifically important problems the calculation cost scales as , and it is very important to use large machines for relatively small number of particles. We achieved 2.03 Tflops, or 57.7% of the theoretical peak performance, using a direct calculation with the individual timestep algorithm, on 64k particles. The best efficiency…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Distributed and Parallel Computing Systems · Parallel Computing and Optimization Techniques
