Matrix Bloom Filter: An Efficient Probabilistic Data Structure for 2-tuple Batch Lookup
Yue Fu, Rong Du, Haibo Hu, Man Ho Au, Dagang Li

TL;DR
The paper introduces the matrix Bloom filter, a high-dimensional probabilistic data structure designed for efficient batch insertion and lookup of 2-tuples, improving performance for OLAP and machine learning applications.
Contribution
It presents the matrix Bloom filter and its variants, extending Bloom filters to handle multivariate data and batch operations, with theoretical and empirical validation.
Findings
Superior performance on datasets with common distributions
Degrades gracefully to standard Bloom filter without distribution assumptions
Supports efficient batch operations for 2-tuples
Abstract
With the growing scale of big data, probabilistic structures receive increasing popularity for efficient approximate storage and query processing. For example, Bloom filters (BF) can achieve satisfactory performance for approximate membership existence query at the expense of false positives. However, a standard Bloom filter can only handle univariate data and single membership existence query, which is insufficient for OLAP and machine learning applications. In this paper, we focus on a common multivariate data type, namely, 2-tuples, or equivalently, key-value pairs. We design the matrix Bloom filter as a high-dimensional extension of the standard Bloom filter. This new probabilistic data structure can not only insert and lookup a single 2-tuple efficiently, but also support these operations efficiently in batches --- a key requirement for OLAP and machine learning tasks. To further…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCaching and Content Delivery · Carbon and Quantum Dots Applications · Covalent Organic Framework Applications
