Scalable $k$-d trees for distributed data
Aritra Chakravorty, William S. Cleveland, Patrick J. Wolfe

TL;DR
This paper introduces a scalable method for constructing $k$-d trees in distributed systems, using median approximation to enable efficient range searches, clustering, and nearest neighbor queries in large datasets.
Contribution
It presents a novel scalable approach for building $k$-d trees in distributed environments with theoretical guarantees and empirical validation.
Findings
The method achieves high accuracy in median approximation.
It demonstrates good scalability in large datasets.
The approach maintains theoretical quality guarantees.
Abstract
Data structures known as -d trees have numerous applications in scientific computing, particularly in areas of modern statistics and data science such as range search in decision trees, clustering, nearest neighbors search, local regression, and so forth. In this article we present a scalable mechanism to construct -d trees for distributed data, based on approximating medians for each recursive subdivision of the data. We provide theoretical guarantees of the quality of approximation using this approach, along with a simulation study quantifying the accuracy and scalability of our proposed approach in practice.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Data Mining Algorithms and Applications · Advanced Database Systems and Queries
