Suffixient Sets

Lore Depuydt; Travis Gagie; Ben Langmead; Giovanni Manzini; Nicola; Prezza

arXiv:2312.01359·cs.DS·June 6, 2024·1 cites

Suffixient Sets

Lore Depuydt, Travis Gagie, Ben Langmead, Giovanni Manzini, Nicola, Prezza

PDF

Open Access

TL;DR

This paper introduces the concept of suffixient sets for texts, providing bounds on their size and a new indexing method that efficiently finds maximal exact matches using the Burrows-Wheeler Transform and straight-line programs.

Contribution

It defines suffixient sets, proves bounds on their size, and develops an index that efficiently computes maximal exact matches with compressed representations.

Findings

01

Suffixient sets have size at most twice the number of BWT runs of the reversed text.

02

The proposed index uses $O(ar{r} + g)$ space, where $ar{r}$ is BWT runs and $g$ is the SLP size.

03

The index finds MEMs in $O(m rac{ ext{log}(\sigma)}{ ext{log} n} + d ext{log} n)$ time.

Abstract

We define a suffixient set for a text $T [1.. n]$ to be a set $S$ of positions between 1 and $n$ such that, for any edge descending from a node $u$ to a node $v$ in the suffix tree of $T$ , there is an element $s \in S$ such that $u$ 's path label is a suffix of $T [1.. s - 1]$ and $T [s]$ is the first character of $(u, v)$ 's edge label. We first show there is a suffixient set of cardinality at most $2 \overset{r}{ˉ}$ , where $\overset{r}{ˉ}$ is the number of runs in the Burrows-Wheeler Transform of the reverse of $T$ . We then show that, given a straight-line program for $T$ with $g$ rules, we can build an $O (\overset{r}{ˉ} + g)$ -space index with which, given a pattern $P [1.. m]$ , we can find the maximal exact matches (MEMs) of $P$ with respect to $T$ in $O (m lo g (σ) / lo g n + d lo g n)$ time, where $σ$ is the size of the alphabet and $d$ is the number of times we would fully or partially descend…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Artificial Intelligence in Games · Video Analysis and Summarization