SharesSkew: An Algorithm to Handle Skew for Joins in MapReduce
Foto Afrati, Nikos Stasinopoulos, Jeffrey D. Ullman, Angelos, Vassilakopoulos

TL;DR
This paper presents SharesSkew, an algorithm designed to efficiently handle data skew in multiway joins within MapReduce by minimizing communication costs and distributing heavy hitter values to prevent skew.
Contribution
It introduces SharesSkew, an adaptation of the Shares algorithm, for skew handling in multiway joins, including closed-form solutions for specific join types.
Findings
Effective skew handling reduces communication costs.
Algorithm implementation demonstrates practical efficiency.
Closed-form solutions simplify parameter computation.
Abstract
In this paper, we investigate the problem of computing a multiway join in one round of MapReduce when the data may be skewed. We optimize on communication cost, i.e., the amount of data that is transferred from the mappers to the reducers. We identify join attributes values that appear very frequently, Heavy Hitters (HH). We distribute HH valued records to reducers avoiding skew by using an adaptation of the Shares~\cite{AfUl} algorithm to achieve minimum communication cost. Our algorithm is implemented for experimentation and is offered as open source software. Furthermore, we investigate a class of multiway joins for which a simpler variant of the algorithm can handle skew. We offer closed forms for computing the parameters of the algorithm for chain and symmetric joins.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
