TL;DR
JZ-Tree introduces a GPU-optimized spatial tree traversal framework using a Morton z-order hierarchy, enabling efficient algorithms like k-nearest neighbors and clustering with significant performance gains.
Contribution
The paper presents a novel Morton z-order based tree hierarchy optimized for GPU architectures, facilitating efficient dual-tree traversal and scalable spatial algorithms in JAX and CUDA.
Findings
Over an order-of-magnitude speedup over existing GPU libraries for large datasets
Strong scaling demonstrated on multi-GPU systems
Open-source implementation available for broad algorithmic applications
Abstract
Algorithms based on spatial tree traversal are widely regarded as among the most efficient and flexible approaches for many problems in CPU-based high-performance computing (HPC). However, directly transferring these algorithms to GPU architectures often yields substantially smaller performance gains than expected in light of the high computational throughput of modern GPUs. The branching nature of tree algorithms leads to thread divergence and irregular memory access patterns -- both of which may severely limit GPU performance. To address these challenges, we propose a Morton (z-order) 'plane-based tree hierarchy' that is specifically designed for GPU architectures. The resulting flattened data layout enables efficient dual-tree traversal with collaborative execution across thread groups, leading to highly coalesced memory access patterns. Based on this framework we present…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
