Efficient Multi-way Theta-Join Processing Using MapReduce
Xiaofei Zhang, Lei Chen, Min Wang

TL;DR
This paper presents a cost-effective method for processing multi-way Theta-join queries efficiently using MapReduce, overcoming limitations of traditional approaches in distributed computing environments for large-scale data analysis.
Contribution
It introduces a novel cost model and execution strategy for multi-way Theta-joins in MapReduce, enabling efficient query processing with minimized total processing time.
Findings
Significant improvement over existing query evaluation strategies.
Effective execution of chain-typed Theta-joins with a single MapReduce job.
Enhanced support for complex join queries in distributed data processing.
Abstract
Multi-way Theta-join queries are powerful in describing complex relations and therefore widely employed in real practices. However, existing solutions from traditional distributed and parallel databases for multi-way Theta-join queries cannot be easily extended to fit a shared-nothing distributed computing paradigm, which is proven to be able to support OLAP applications over immense data volumes. In this work, we study the problem of efficient processing of multi-way Theta-join queries using MapReduce from a cost-effective perspective. Although there have been some works using the (key,value) pair-based programming model to support join operations, efficient processing of multi-way Theta-join queries has never been fully explored. The substantial challenge lies in, given a number of processing units (that can run Map or Reduce tasks), mapping a multi-way Theta-join query to a number of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Cloud Computing and Resource Management · Data Management and Algorithms
