Three-Way Joins on MapReduce: An Experimental Study
Ben Kimmett, Alex Thomo, S. Venkatesh

TL;DR
This paper provides an experimental analysis of three-way join algorithms on MapReduce, showing their scalability and efficiency, and offers guidance on when to use multi-way versus cascaded two-way joins for large data sets.
Contribution
It evaluates a state-of-the-art MapReduce multi-way join algorithm and compares its performance with cascaded two-way joins, providing practical insights for large-scale data processing.
Findings
The studied algorithm scales better than previously suggested.
Cascaded two-way joins are more efficient for summarized or aggregated results.
The paper offers practical guidelines for choosing join strategies on large data sets.
Abstract
We study three-way joins on MapReduce. Joins are very useful in a multitude of applications from data integration and traversing social networks, to mining graphs and automata-based constructions. However, joins are expensive, even for moderate data sets; we need efficient algorithms to perform distributed computation of joins using clusters of many machines. MapReduce has become an increasingly popular distributed computing system and programming paradigm. We consider a state-of-the-art MapReduce multi-way join algorithm by Afrati and Ullman and show when it is appropriate for use on very large data sets. By providing a detailed experimental study, we demonstrate that this algorithm scales much better than what is suggested by the original paper. However, if the join result needs to be summarized or aggregated, as opposed to being only enumerated, then the aggregation step can be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPeer-to-Peer Network Technologies · Caching and Content Delivery · Data Management and Algorithms
