Fast Distributed Complex Join Processing
Hao Zhang, Miao Qiao, Jeffrey Xu Yu, Hong Cheng

TL;DR
This paper introduces ADJ, a novel distributed join method that optimizes communication, pre-computing, and computation costs to efficiently evaluate complex multi-way joins in a single round.
Contribution
The paper presents ADJ, an adaptive approach that finds optimal query plans by exploring cost-effective partial results, improving join performance significantly.
Findings
ADJ outperforms existing methods by up to orders of magnitude.
High-quality cost estimation is achieved through sampling.
The approach effectively balances pre-computing, communication, and computation.
Abstract
In this work, we study the problem of co-optimize communication, pre-computing, and computation cost in one-round multi-way join evaluation. We propose a multi-way join approach ADJ (Adaptive Distributed Join) for complex join which finds one optimal query plan to process by exploring cost-effective partial results in terms of the trade-off between pre-computing, communication, and computation.We analyze the input relations for a given join query and find one optimal over a set of query plans in some specific form, with high-quality cost estimation by sampling. Our extensive experiments confirm that ADJ outperforms the existing multi-way join methods by up to orders of magnitude.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Data Management and Algorithms · Peer-to-Peer Network Technologies
