FM-index for dummies
Szymon Grabowski, Marcin Raniszewski, Sebastian Deorowicz

TL;DR
This paper introduces a new cache-friendly implementation of the FM-index's rank primitive, significantly increasing search speed at the cost of increased space, benefiting practical full-text pattern searching.
Contribution
It proposes a simple, faster FM-index variant with a cache-friendly rank implementation, balancing speed and space for improved practical performance.
Findings
2-3 times faster than existing variants
Uses 1.5-5 times more space
Effective for full-text pattern searching
Abstract
The FM-index is a celebrated compressed data structure for full-text pattern searching. After the first wave of interest in its theoretical developments, we can observe a surge of interest in practical FM-index variants in the last few years. These enhancements are often related to a bit-vector representation, augmented with an efficient rank-handling data structure. In this work, we propose a new, cache-friendly, implementation of the rank primitive and advocate for a very simple architecture of the FM-index, which trades compression ratio for speed. Experimental results show that our variants are 2--3 times faster than the fastest known ones, for the price of using typically 1.5--5 times more space.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Information Retrieval and Search Behavior · Data Management and Algorithms
