Suffixient Sets
Lore Depuydt, Travis Gagie, Ben Langmead, Giovanni Manzini, Nicola, Prezza

TL;DR
This paper introduces the concept of suffixient sets for texts, providing bounds on their size and a new indexing method that efficiently finds maximal exact matches using the Burrows-Wheeler Transform and straight-line programs.
Contribution
It defines suffixient sets, proves bounds on their size, and develops an index that efficiently computes maximal exact matches with compressed representations.
Findings
Suffixient sets have size at most twice the number of BWT runs of the reversed text.
The proposed index uses $O(ar{r} + g)$ space, where $ar{r}$ is BWT runs and $g$ is the SLP size.
The index finds MEMs in $O(m rac{ ext{log}(\sigma)}{ ext{log} n} + d ext{log} n)$ time.
Abstract
We define a suffixient set for a text to be a set of positions between 1 and such that, for any edge descending from a node to a node in the suffix tree of , there is an element such that 's path label is a suffix of and is the first character of 's edge label. We first show there is a suffixient set of cardinality at most , where is the number of runs in the Burrows-Wheeler Transform of the reverse of . We then show that, given a straight-line program for with rules, we can build an -space index with which, given a pattern , we can find the maximal exact matches (MEMs) of with respect to in time, where is the size of the alphabet and is the number of times we would fully or partially descend…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Artificial Intelligence in Games · Video Analysis and Summarization
