GYM: A Multiround Join Algorithm In MapReduce
Foto Afrati, Manas Joglekar, Christopher R\'e, Semih Salihoglu,, Jeffrey D. Ullman

TL;DR
This paper introduces GYM, a multiround join algorithm in MapReduce that optimizes the tradeoff between rounds and communication cost for computing equijoins, using new notions like intersection width and generalized hypertree decompositions.
Contribution
It presents GYM, a distributed join algorithm that leverages new query complexity measures to improve round and communication efficiency in MapReduce.
Findings
GYM computes joins in O(d + log(n)) rounds with optimized communication.
Introduces intersection width and generalized hypertree decompositions for query analysis.
Provides a spectrum of tradeoffs between communication cost and number of rounds.
Abstract
Multiround algorithms are now commonly used in distributed data processing systems, yet the extent to which algorithms can benefit from running more rounds is not well understood. This paper answers this question for a spectrum of rounds for the problem of computing the equijoin of relations. Specifically, given any query with width , {\em intersection width} , input size , output size , and a cluster of machines with memory available per machine, we show that: (1) can be computed in rounds with communication cost. (2) can be computed in rounds with communication cost. \end{itemize} Intersection width is a new notion of queries and generalized hypertree decompositions (GHDs) of queries we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplexity and Algorithms in Graphs · Advanced Database Systems and Queries · Data Management and Algorithms
