A sparse octree gravitational N-body code that runs entirely on the GPU processor
Jeroen B\'edorf, Evghenii Gaburov, Simon Portegies Zwart

TL;DR
This paper introduces a GPU-based sparse octree gravitational N-body code that significantly accelerates computations, achieving over 2.8 million particles per second, and is portable across many-core devices supporting CUDA or OpenCL.
Contribution
The authors develop parallel algorithms for sparse octree construction and traversal on GPUs, enabling high-performance gravitational simulations entirely on GPU hardware.
Findings
GPU code outperforms CPU code in tree construction
Achieves over 2.8 million particles per second
Provides portable algorithms for CUDA and OpenCL
Abstract
We present parallel algorithms for constructing and traversing sparse octrees on graphics processing units (GPUs). The algorithms are based on parallel-scan and sort methods. To test the performance and feasibility, we implemented them in CUDA in the form of a gravitational tree-code which completely runs on the GPU.(The code is publicly available at: http://castle.strw.leidenuniv.nl/software.html) The tree construction and traverse algorithms are portable to many-core devices which have support for CUDA or OpenCL programming languages. The gravitational tree-code outperforms tuned CPU code during the tree-construction and shows a performance improvement of more than a factor 20 overall, resulting in a processing rate of more than 2.8 million particles per second.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
