RAMBO: Repeated And Merged BloOm Filter for Ultra-fast Multiple Set Membership Testing (MSMT) on Large-Scale Data
Gaurav Gupta, Minghao Yan, Benjamin Coleman, R. A. Leo Elworth, Tharun, Medini, Todd Treangen, Anshumali Shrivastava

TL;DR
RAMBO is a novel data structure that significantly accelerates multiple set membership testing on large datasets by reducing query time and maintaining high accuracy, outperforming existing genome indexing methods.
Contribution
Introduces RAMBO, a new data structure that achieves faster query times and better scalability for MSMT on large-scale data, with simple parallelizable implementation.
Findings
RAMBO achieves O(√K log K) query time in expectation.
RAMBO outperforms state-of-the-art genome indexing methods.
Indexing 170 TB genome data takes only 14 hours with RAMBO.
Abstract
Multiple Set Membership Testing (MSMT) is a well-known problem in a variety of search and query applications. Given a dataset of K different sets and a query q, it aims to find all of the sets containing the query. Trivially, an MSMT instance can be reduced to K membership testing instances, each with the same q, leading to O(K) query time with a simple array of Bloom Filters. We propose a data-structure called RAMBO (Repeated And Merged BloOm Filter) that achieves O(\sqrt{K} log K) query time in expectation with an additional worst-case memory cost factor of O(log K) beyond the array of Bloom Filters. Due to this, RAMBO is a very fast and accurate data-structure. Apart from being embarrassingly parallel, supporting cheap updates for streaming inputs, zero false-negative rate, and low false-positive rate, RAMBO beats the state-of-the-art approaches for genome indexing methods: COBS…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCaching and Content Delivery · Network Packet Processing and Optimization · Algorithms and Data Compression
