R*-Grove: Balanced Spatial Partitioning for Large-scale Datasets
Tin Vu, Ahmed Eldawy

TL;DR
R*-Grove is a new spatial partitioning method that efficiently divides large datasets into balanced, high-quality partitions, improving query performance and integration with big data platforms.
Contribution
It introduces R*-Grove, a novel partitioning technique that addresses load balancing and spatial quality challenges in big spatial data systems.
Findings
Outperforms existing partitioning techniques in spatial query processing
Achieves high load balance and block utilization
Easily integrates with platforms like Spark and Hadoop
Abstract
The rapid growth of big spatial data urged the research community to develop several big spatial data systems. Regardless of their architecture, one of the fundamental requirements of all these systems is to spatially partition the data efficiently across machines. The core challenges of big spatial partitioning are building high spatial quality partitions while simultaneously taking advantages of distributed processing models by providing load balanced partitions. Previous works on big spatial partitioning are to reuse existing index search trees as-is, e.g., the R-tree family, STR, Kd-tree, and Quad-tree, by building a temporary tree for a sample of the input and use its leaf nodes as partition boundaries. However, we show in this paper that none of those techniques has addressed the mentioned challenges completely. This paper proposes a novel partitioning method, termed R*-Grove,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
