Full-text and Keyword Indexes for String Searching
Aleksander Cis{\l}ak

TL;DR
This paper reviews full-text and keyword indexes, introduces the FM-bloated index for faster string searches with higher space use, and presents the split index for efficient approximate matching, demonstrating significant speed improvements.
Contribution
It introduces the FM-bloated index with space-speed trade-offs and the split index for fast k-mismatch queries, along with practical implementation insights.
Findings
FM-bloated index achieves faster searches with high space cost.
Split index efficiently solves 1-error k-mismatch problems.
Query times of about 1 microsecond for small dictionaries.
Abstract
In this work, we present a literature review for full-text and keyword indexes as well as our contributions (which are mostly practice-oriented). The first contribution is the FM-bloated index, which is a modification of the well-known FM-index (a compressed, full-text index) that trades space for speed. In our approach, the count table and the occurrence lists store information about selected -grams in addition to the individual characters. Two variants are described, namely one using bits of space with average query time, and one with linear space and average query time, where is the input text length and is the pattern length. We experimentally show that a significant speedup can be achieved by operating on -grams (albeit at the cost of very high space requirements, hence the name "bloated"). In the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Natural Language Processing Techniques · Network Packet Processing and Optimization
