Including Bloom Filters in Bottom-up Optimization
Tim Zeyl, Qi Cheng, Reza Pournaghi, Jason Lam, Weicheng Wang, Calvin, Wong, Chong Chen, Per-Ake Larson

TL;DR
This paper demonstrates how integrating Bloom filters into bottom-up query optimization enhances query performance and join order selection, achieving significant latency reductions on TPC-H benchmarks.
Contribution
It introduces a novel method for incorporating Bloom filters into bottom-up cost-based query optimizers, addressing challenges in search space and demonstrating improved query latency.
Findings
Achieved 32.8% reduction in query latency on TPC-H benchmark
Extended Bloom filter-aware optimization to all query types
Provided heuristics balancing optimization time and query performance
Abstract
Bloom filters are used in query processing to perform early data reduction and improve query performance. The optimal query plan may be different when Bloom filters are used, indicating the need for Bloom filter-aware query optimization. To date, Bloom filter-aware query optimization has only been incorporated in a top-down query optimizer and limited to snowflake queries. In this paper, we show how Bloom filters can be incorporated in a bottom-up cost-based query optimizer. We highlight the challenges in limiting optimizer search space expansion, and offer an efficient solution. We show that including Bloom filters in cost-based optimization can lead to better join orders with effective predicate transfer between operators. On a 100 GB instance of the TPC-H database, our approach achieved a 32.8% further reduction in latency for queries involving Bloom filters, compared to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
