NOCAP: Near-Optimal Correlation-Aware Partitioning Joins
Zichen Zhu, Xiao Hu, Manos Athanassoulis

TL;DR
NOCAP is a new partitioning join algorithm that optimally exploits correlation skew in join attributes, significantly improving performance over existing methods under various memory constraints.
Contribution
The paper introduces NOCAP, a correlation-aware partitioning algorithm that achieves near-optimal join performance by tailoring partitioning to attribute correlation distributions.
Findings
NOCAP outperforms state-of-the-art algorithms by up to 30% on skewed data.
NOCAP is up to 4 times faster than Grace Hash Join.
NOCAP maintains high performance across a wide range of memory budgets.
Abstract
Storage-based joins are still commonly used today because the memory budget does not always scale with the data size. One of the many join algorithms developed that has been widely deployed and proven to be efficient is the Hybrid Hash Join (HHJ), which is designed to exploit any available memory to maximize the data that is joined directly in memory. However, HHJ cannot fully exploit detailed knowledge of the join attribute correlation distribution. In this paper, we show that given a correlation skew in the join attributes, HHJ partitions data in a suboptimal way. To do that, we derive the optimal partitioning using a new cost-based analysis of partitioning-based joins that is tailored for primary key - foreign key (PK-FK) joins, one of the most common join types. This optimal partitioning strategy has a high memory cost, thus, we further derive an approximate algorithm that has…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCaching and Content Delivery · Graph Labeling and Dimension Problems · Graph Theory and Algorithms
