FMtree: A fast locating algorithm of FM-indexes for genomic data
Haoyu Cheng, Ming Wu, Yun Xu

TL;DR
FMtree is a novel quadtree-based locating algorithm that significantly accelerates pattern occurrence retrieval in FM-indexes for genomic data, outperforming existing methods by an order of magnitude while maintaining memory efficiency.
Contribution
The paper introduces FMtree, a new quadtree-based locating algorithm that improves the speed of pattern occurrence retrieval in FM-indexes for genomic data.
Findings
FMtree is about ten times faster than previous algorithms.
FMtree maintains memory efficiency despite increased speed.
Experimental results validate FMtree's superior performance.
Abstract
Motivation: As a fundamental task in bioinformatics, searching for massive short patterns over a long text is widely accelerated by various compressed full-text indexes. These indexes are able to provide similar searching functionalities to classical indexes, e.g., suffix trees and suffix arrays, while requiring less space. For genomic data, a well-known family of compressed full-text index, called FM-indexes, presents unmatched performance in practice. One major drawback of FM-indexes is that their locating operations, which report all occurrence positions of patterns in a given text, are particularly slow, especially for the patterns with many occurrences. Results: In this paper, we introduce a novel locating algorithm, FMtree, to fast retrieve all occurrence positions of any pattern via FM-indexes. When searching for a pattern over a given text, FMtree organizes the search space of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Genomics and Phylogenetic Studies · Machine Learning in Bioinformatics
