FM-index for dummies

Szymon Grabowski; Marcin Raniszewski; Sebastian Deorowicz

arXiv:1506.04896·cs.DS·October 27, 2015

FM-index for dummies

Szymon Grabowski, Marcin Raniszewski, Sebastian Deorowicz

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new cache-friendly implementation of the FM-index's rank primitive, significantly increasing search speed at the cost of increased space, benefiting practical full-text pattern searching.

Contribution

It proposes a simple, faster FM-index variant with a cache-friendly rank implementation, balancing speed and space for improved practical performance.

Findings

01

2-3 times faster than existing variants

02

Uses 1.5-5 times more space

03

Effective for full-text pattern searching

Abstract

The FM-index is a celebrated compressed data structure for full-text pattern searching. After the first wave of interest in its theoretical developments, we can observe a surge of interest in practical FM-index variants in the last few years. These enhancements are often related to a bit-vector representation, augmented with an efficient rank-handling data structure. In this work, we propose a new, cache-friendly, implementation of the rank primitive and advocate for a very simple architecture of the FM-index, which trades compression ratio for speed. Experimental results show that our variants are 2--3 times faster than the fastest known ones, for the price of using typically 1.5--5 times more space.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mranisz/fmdummy
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Information Retrieval and Search Behavior · Data Management and Algorithms