Handling Skew in Multiway Joins in Parallel Processing
Foto N. Afrati, Jeffrey D. Ullman, Angelos Vasilakopoulos

TL;DR
This paper introduces a novel technique for efficiently handling data skew in multiway joins within MapReduce, minimizing communication costs during distributed query processing.
Contribution
It adapts the Shares algorithm to effectively manage skew in multiway joins in a single MapReduce round, reducing data transfer overhead.
Findings
Reduces communication cost in skewed multiway joins
Efficient single-round MapReduce implementation
Improves load balancing during distributed join processing
Abstract
Handling skew is one of the major challenges in query processing. In distributed computational environments such as MapReduce, uneven distribution of the data to the servers is not desired. One of the dominant measures that we want to optimize in distributed environments is communication cost. In a MapReduce job this is the amount of data that is transferred from the mappers to the reducers. In this paper we will introduce a novel technique for handling skew when we want to compute a multiway join in one MapReduce round with minimum communication cost. This technique is actually an adaptation of the Shares algorithm [Afrati et. al, TKDE 2011].
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Graph Theory and Algorithms · Data Management and Algorithms
