A Near-Optimal Parallel Algorithm for Joining Binary Relations
Bas Ketsman, Dan Suciu, and Yufei Tao

TL;DR
This paper introduces a near-optimal parallel algorithm for joining binary relations in the MPC model, achieving load bounds close to the theoretical minimum and providing new mathematical insights into subgraph enumeration.
Contribution
It presents a novel constant-round MPC algorithm for binary relation joins with near-optimal load, based on the new isolated cartesian product theorem.
Findings
Achieves load of O(m/p^{1/}) matching lower bounds
Provides a new theorem offering insights into the mathematical structure of join problems
Enables optimal subgraph enumeration in the MPC model
Abstract
We present a constant-round algorithm in the massively parallel computation (MPC) model for evaluating a natural join where every input relation has two attributes. Our algorithm achieves a load of where is the total size of the input relations, is the number of machines, is the join's fractional edge covering number, and hides a polylogarithmic factor. The load matches a known lower bound up to a polylogarithmic factor. At the core of the proposed algorithm is a new theorem (which we name the "isolated cartesian product theorem") that provides fresh insight into the problem's mathematical structure. Our result implies that the subgraph enumeration problem, where the goal is to report all the occurrences of a constant-sized subgraph pattern, can be settled optimally (up to a polylogarithmic factor) in the MPC model.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOptimization and Packing Problems
