Scalable $k$-d trees for distributed data

Aritra Chakravorty; William S. Cleveland; Patrick J. Wolfe

arXiv:2201.08288·cs.DS·January 21, 2022

Scalable $k$-d trees for distributed data

Aritra Chakravorty, William S. Cleveland, Patrick J. Wolfe

PDF

Open Access

TL;DR

This paper introduces a scalable method for constructing $k$-d trees in distributed systems, using median approximation to enable efficient range searches, clustering, and nearest neighbor queries in large datasets.

Contribution

It presents a novel scalable approach for building $k$-d trees in distributed environments with theoretical guarantees and empirical validation.

Findings

01

The method achieves high accuracy in median approximation.

02

It demonstrates good scalability in large datasets.

03

The approach maintains theoretical quality guarantees.

Abstract

Data structures known as $k$ -d trees have numerous applications in scientific computing, particularly in areas of modern statistics and data science such as range search in decision trees, clustering, nearest neighbors search, local regression, and so forth. In this article we present a scalable mechanism to construct $k$ -d trees for distributed data, based on approximating medians for each recursive subdivision of the data. We provide theoretical guarantees of the quality of approximation using this approach, along with a simulation study quantifying the accuracy and scalability of our proposed approach in practice.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Management and Algorithms · Data Mining Algorithms and Applications · Advanced Database Systems and Queries