Implementation of a Parallel Tree Method on a GPU
Naohito Nakasato

TL;DR
This paper presents a GPU-based parallel implementation of the kd-tree algorithm, significantly improving the efficiency of particle interaction computations by leveraging cache optimization and detailed performance testing.
Contribution
The paper introduces a novel parallel GPU implementation of the kd-tree method with specific optimizations like localized particle ordering for enhanced cache utilization.
Findings
GPU implementation achieves practical and efficient tree traversal.
Performance measurements demonstrate significant speedup over CPU methods.
Localized particle ordering improves cache efficiency and overall performance.
Abstract
The kd-tree is a fundamental tool in computer science. Among other applications, the application of kd-tree search (by the tree method) to the fast evaluation of particle interactions and neighbor search is highly important, since the computational complexity of these problems is reduced from O(N^2) for a brute force method to O(N log N) for the tree method, where N is the number of particles. In this paper, we present a parallel implementation of the tree method running on a graphics processing unit (GPU). We present a detailed description of how we have implemented the tree method on a Cypress GPU. An optimization that we found important is localized particle ordering to effectively utilize cache memory. We present a number of test results and performance measurements. Our results show that the execution of the tree traversal in a force calculation on a GPU is practical and efficient.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
