Fully-Functional Suffix Trees and Optimal Text Searching in BWT-runs Bounded Space
Travis Gagie, Gonzalo Navarro, and Nicola Prezza

TL;DR
This paper extends the Run-Length FM-index to efficiently locate pattern occurrences and supports various text operations within space proportional to the BWT's run count, enabling optimal pattern matching in highly repetitive texts.
Contribution
It introduces a method to locate pattern occurrences efficiently within O(r) space and achieves optimal search times, advancing indexing of repetitive texts.
Findings
Supports pattern counting in optimal time O(m).
Locates pattern occurrences in O(log log w) time per occurrence.
Provides full suffix tree functionality within compressed space.
Abstract
Indexing highly repetitive texts - such as genomic databases, software repositories and versioned text collections - has become an important problem since the turn of the millennium. A relevant compressibility measure for repetitive texts is r, the number of runs in their Burrows-Wheeler Transforms (BWTs). One of the earliest indexes for repetitive collections, the Run-Length FM-index, used O(r) space and was able to efficiently count the number of occurrences of a pattern of length m in the text (in loglogarithmic time per pattern symbol, with current techniques). However, it was unable to locate the positions of those occurrences efficiently within a space bounded in terms of r. In this paper we close this long-standing problem, showing how to extend the Run-Length FM-index so that it can locate the occ occurrences efficiently within O(r) space (in loglogarithmic time each), and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
