Optimizing Queries with Many-to-Many Joins
Hasara Kalumin, Amol Deshpande

TL;DR
This paper introduces an improved cost model and optimization algorithms for multi-way many-to-many join queries, especially in graph workloads, demonstrating enhanced efficiency and robustness through extensive experiments.
Contribution
It presents a new cost model and optimization algorithms tailored for complex many-to-many join queries, addressing challenges in graph and cyclic workloads.
Findings
Factorized representation reduces intermediate result size.
Bitvector-based tuple elimination improves query efficiency.
Robustness to join order reduces need for complex optimization.
Abstract
As database query processing techniques are being used to handle diverse workloads, a key emerging challenge is how to efficiently handle multi-way join queries containing multiple many-to-many joins. While uncommon in traditional enterprise settings that have been the focus of much of the query optimization work to date, such queries are seen frequently in other contexts such as graph workloads. This has led to much work on developing join algorithms for handling cyclic queries, on compressed (factorized) representations for more efficient storage of intermediate results, and on use of semi-joins or predicate transfer to avoid generating large redundant intermediate results. In this paper, we address a core query optimization problem in this context. Specifically, we introduce an improved cost model that more accurately captures the cost of a query plan in such scenarios, and we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Advanced Database Systems and Queries · Data Mining Algorithms and Applications
