SieveJoin: Boosting Multi-Way Joins with Reusable Bloom Filters

Qingzhi Ma

arXiv:2308.16370·cs.DB·September 1, 2023

SieveJoin: Boosting Multi-Way Joins with Reusable Bloom Filters

Qingzhi Ma

PDF

Open Access

TL;DR

SieveJoin is a novel multi-way join algorithm that extends Bloomjoin by propagating Bloom filters along join paths, enabling early elimination of useless intermediate results and significantly improving join query efficiency.

Contribution

It introduces SieveJoin, a new multi-way join method that efficiently propagates Bloom filters to reduce intermediate results and enhance performance, extending Bloomjoin's capabilities.

Findings

01

SieveJoin outperforms existing methods in response time.

02

It effectively reduces unnecessary intermediate join results.

03

Experimental results validate its efficiency across datasets.

Abstract

Improving data systems' performance for join operations has long been an issue of great importance. More recently, a lot of focus has been devoted to multi-way join performance and especially on reducing the negative impact of producing intermediate tuples, which in the end do not make it in the final result. We contribute a new multi-way join algorithm, coined SieveJoin, which extends the well-known Bloomjoin algorithm to multi-way joins and achieves state-of-the-art performance in terms of join query execution efficiency. SieveJoin's salient novel feature is that it allows the propagation of Bloom filters in the join path, enabling the system to `stop early' and eliminate useless intermediate join results. The key design objective of SieveJoin is to efficiently `learn' the join results, based on Bloom filters, with negligible memory overheads. We discuss the bottlenecks in delaying…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCaching and Content Delivery · Data Management and Algorithms · Internet Traffic Analysis and Secure E-voting