
TL;DR
This paper presents a GPU-based parallel implementation of the oct-tree method for efficient particle interaction calculations, achieving significant speed improvements over previous records.
Contribution
The paper introduces a highly efficient GPU implementation of the oct-tree method, enabling large-scale particle simulations with reduced computational time.
Findings
Achieved a sustained speed of 21.8 Gflops on GPU.
Successfully simulated universe structure formation with 2.87 million particles.
Reduced computational cost per Gflops by two and a half times compared to previous records.
Abstract
The kd-tree is a fundamental tool in computer science. Among others, an application of the kd-tree search (oct-tree method) to fast evaluation of particle interactions and neighbor search is highly important since computational complexity of these problems are reduced from O(N^2) with a brute force method to O(N log N) with the tree method where N is a number of particles. In this paper, we present a parallel implementation of the tree method running on a graphic processor unit (GPU). We successfully run a simulation of structure formation in the universe very efficiently. On our system, which costs roughly $900, the run with N ~ 2.87x10^6 particles took 5.79 hours and executed 1.2x10^13 force evaluations in total. We obtained the sustained computing speed of 21.8 Gflops and the cost per Gflops of 41.6/Gflops that is two and half times better than the previous record in 2006.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Physics and Python Applications · Algorithms and Data Compression · Scientific Research and Discoveries
