Performance analysis of parallel gravitational $N$-body codes on large GPU cluster
Siyi Huang, Rainer Spurzem, Peter Berczik

TL;DR
This paper compares two GPU-optimized parallel gravitational N-body simulation codes, NBODY6++ and Bonsai, analyzing their performance, accuracy, and efficiency on large GPU clusters, revealing their strengths and differences for astrophysical modeling.
Contribution
It provides a detailed performance and accuracy comparison of two distinct GPU-accelerated N-body codes, highlighting their optimization levels and applicability.
Findings
Both codes achieve nearly half of GPU single precision performance.
A potential speed-up of 200-300 times with 1k GPUs is predicted.
Bonsai generally uses larger time steps and has higher energy errors than NBODY6++.
Abstract
We compare the performance of two very different parallel gravitational -body codes for astrophysical simulations on large GPU clusters, both pioneer in their own fields as well as in certain mutual scales - NBODY6++ and Bonsai. We carry out the benchmark of the two codes by analyzing their performance, accuracy and efficiency through the modeling of structure decomposition and timing measurements. We find that both codes are heavily optimized to leverage the computational potential of GPUs as their performance has approached half of the maximum single precision performance of the underlying GPU cards. With such performance we predict that a speed-up of can be achieved when up to 1k processors and GPUs are employed simultaneously. We discuss the quantitative information about comparisons of two codes, finding that in the same cases Bonsai adopts larger time steps as well as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
