A Bloom filter based semi-index on $q$-grams
Szymon Grabowski, Robert Susik, Marcin Raniszewski

TL;DR
This paper introduces a Bloom filter based semi-index for q-grams that enables faster pattern searches in text blocks with minimal space overhead, outperforming previous semi-index methods significantly.
Contribution
The paper proposes a novel Bloom filter based semi-index for q-grams, offering improved speed and space tradeoffs over existing semi-index approaches.
Findings
Up to 1000x faster search performance compared to Claude et al.
Comparable space usage with significantly improved speed.
Effective in reducing search scope to small text blocks.
Abstract
We present a simple -gram based semi-index, which allows to look for a pattern typically only in a small fraction of text blocks. Several space-time tradeoffs are presented. Experiments on Pizza & Chili datasets show that our solution is up to three orders of magnitude faster than the Claude et al. \cite{CNPSTjda10} semi-index at a comparable space usage.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Network Packet Processing and Optimization · Caching and Content Delivery
