Suffixient Arrays: a New Efficient Suffix Array Compression Technique
Davide Cenzato, Lore Depuydt, Travis Gagie, Sung-Hwan Kim, Giovanni, Manzini, Francisco Olivares, Nicola Prezza

TL;DR
The paper introduces the Suffixient Array, a simple and efficient suffix array compression technique that reduces space and improves query speed, especially on repetitive texts, outperforming existing methods like the r-index.
Contribution
It presents the Suffixient Array, a novel subset of the suffix array that is smaller, faster, and easier to compute than previous compression methods, with proven efficiency and practical advantages.
Findings
Suffixient Array is significantly smaller than traditional suffix arrays.
It achieves faster query times, close to hardware limits.
Experimental results show superior performance over the r-index.
Abstract
The Suffix Array is a classic text index enabling on-line pattern matching queries via simple binary search. The main drawback of the Suffix Array is that it takes linear space in the text's length, even if the text itself is extremely compressible. Several works in the literature showed that the Suffix Array can be compressed, but they all rely on complex succinct data structures which in practice tend to exhibit poor cache locality and thus significantly slow down queries. In this paper, we propose a new simple and very efficient solution to this problem by presenting the \emph{Suffixient Array}: a tiny subset of the Suffix Array \emph{sufficient} to locate on-line one pattern occurrence (in general, all its Maximal Exact Matches) via binary search, provided that random access to the text is available. We prove that: (i) the Suffixient Array length is a strong repetitiveness…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Machine Learning and Algorithms · Image Retrieval and Classification Techniques
