EPR-dictionaries: A practical and fast data structure for constant time searches in unidirectional and bidirectional FM-indices
Christopher Pockrandt, Marcel Ehrhardt, Knut Reinert

TL;DR
This paper presents the EPR-dictionary, a new data structure enabling constant-time searches in FM-indices, significantly improving speed for bioinformatics applications like read mapping.
Contribution
The introduction of the EPR-dictionary replaces wavelet trees, achieving constant-time search steps in FM-indices with practical implementation and validation.
Findings
Achieved 2.6-4.8 times faster search in experiments.
First implementation of constant-time search in 2FM indices.
Validated theoretical efficiency with practical performance improvements.
Abstract
We introduce a new, practical method for conducting an exact search in a uni- and bidirectional FM index in time per step while using bits of space. This is done by replacing the binary wavelet tree by a new data structure, the Enhanced Prefixsum Rank dictionary (EPR-dictionary). We implemented this method in the SeqAn C++ library and experimentally validated our theoretical results. In addition we compared our implementation with other freely available implementations of bidirectional indices and show that we are between times faster. This will have a large impact for many bioinformatics applications that rely on practical implementations of (2)FM indices e.g. for read mapping. To our knowledge this is the first implementation of a constant time method for a search step in 2FM indices.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · RNA and protein synthesis mechanisms · Genomics and Phylogenetic Studies
