({\alpha}, k)-Minimal Sorting and Skew Join in MPI and MapReduce
Silu Huang, Ada Wai-Chee Fu

TL;DR
This paper introduces new ({\alpha}, k)-minimal algorithms for sorting and skew join in distributed computing, achieving superior workload balance and efficiency compared to existing methods in cluster environments.
Contribution
The paper presents the first ({\alpha}, k)-minimal algorithms for sorting and skew join that outperform previous algorithms in workload balance and efficiency.
Findings
Sorting algorithm is 25% more efficient than Terasort.
Workload distribution is over 50% more even.
Algorithms achieve the best workload balancing guarantees.
Abstract
As computer clusters are found to be highly effective for handling massive datasets, the design of efficient parallel algorithms for such a computing model is of great interest. We consider ({\alpha}, k)-minimal algorithms for such a purpose, where {\alpha} is the number of rounds in the algorithm, and k is a bound on the deviation from perfect workload balance. We focus on new ({\alpha}, k)-minimal algorithms for sorting and skew equijoin operations for computer clusters. To the best of our knowledge the proposed sorting and skew join algorithms achieve the best workload balancing guarantee when compared to previous works. Our empirical study shows that they are close to optimal in workload balancing. In particular, our proposed sorting algorithm is around 25% more efficient than the state-of-the-art Terasort algorithm and achieves significantly more even workload distribution by over…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGraph Theory and Algorithms · Advanced Database Systems and Queries · Cloud Computing and Resource Management
