Building a Balanced k-d Tree with MapReduce
Russell A. Brown

TL;DR
This paper introduces a MapReduce-compatible algorithm for building balanced k-d trees efficiently, leveraging presorting in multiple dimensions, enabling scalable distributed data indexing and search.
Contribution
It presents a novel presorting-based algorithm for balanced k-d tree construction suitable for distributed environments like MapReduce.
Findings
Builds a balanced k-d tree in O(kn log n) time using presorting.
Enables distributed construction and search of k-d trees as graphs.
Suitable for large-scale data indexing in distributed systems.
Abstract
The original description of the k-d tree recognized that rebalancing techniques, such as are used to build an AVL tree or a red-black tree, are not applicable to a k-d tree. Hence, in order to build a balanced k-d tree, it is necessary to obtain all of the data prior to building the tree then to build the tree via recursive subdivision of the data. One algorithm for building a balanced k-d tree finds the median of the data for each recursive subdivision of the data and builds the tree in O(n log n) time. A new algorithm builds a balanced k-d tree by presorting the data in each of k dimensions prior to building the tree, then preserves the order of the k presorts during recursive subdivision of the data and builds the tree in O(kn log n) time. This new algorithm is amenable to execution via MapReduce and permits building and searching a k-d tree that is represented as a distributed graph.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Data Mining Algorithms and Applications · Advanced Database Systems and Queries
