A New Framework for Join Product Skew
Foto Afrati, Victor Kyritsis, Paraskevas V. Lekeas, Dora Souliou

TL;DR
This paper introduces a new framework and algorithm, HJPS, to address join product skew in parallel database joins, aiming to improve load balancing by considering frequency-based data distribution.
Contribution
The paper presents a static frequency class-based approach and the HJPS algorithm to effectively mitigate join product skew in shared nothing architectures.
Findings
The approach reduces load imbalance caused by join product skew.
Frequency-based task assignment improves join performance.
The HJPS algorithm effectively handles skew in experimental scenarios.
Abstract
Different types of data skew can result in load imbalance in the context of parallel joins under the shared nothing architecture. We study one important type of skew, join product skew (JPS). A static approach based on frequency classes is proposed which takes for granted the data distribution of join attribute values. It comes from the observation that the join selectivity can be expressed as a sum of products of frequencies of the join attribute values. As a consequence, an appropriate assignment of join sub-tasks, that takes into consideration the magnitude of the frequency products can alleviate the join product skew. Motivated by the aforementioned remark, we propose an algorithm, called Handling Join Product Skew (HJPS), to handle join product skew.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
