A GPU accelerated Barnes-Hut Tree Code for FLASH4
Gunther Lukat, Robi Banerjee

TL;DR
This paper introduces a GPU-accelerated Barnes-Hut tree code integrated into FLASH4, significantly enhancing gravitational potential calculations with speedups up to 60 times for specific tasks.
Contribution
The authors developed and integrated a CUDA-C based Barnes-Hut tree code into FLASH4, achieving substantial performance improvements over existing solvers.
Findings
Speedup of at least 3x for gravity unit
Up to 60x speedup in specific tests
Overall simulation speedup up to 10x
Abstract
We present a GPU accelerated CUDA-C implementation of the Barnes Hut (BH) tree code for calculating the gravitational potential on octree adaptive meshes. The tree code algorithm is implemented within the FLASH4 adaptive mesh refinement (AMR) code framework and therefore fully MPI parallel. We describe the algorithm and present test results that demonstrate its accuracy and performance in comparison to the algorithms available in the current FLASH4 version. We use a MacLaurin spheroid to test the accuracy of our new implementation and use spherical, collapsing cloud cores with effective AMR to carry out performance tests also in comparison with previous gravity solvers. Depending on the setup and the GPU/CPU ratio, we find a speedup for the gravity unit of at least a factor of 3 and up to 60 in comparison to the gravity solvers implemented in the FLASH4 code. We find an overall speedup…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
