Breaking the $O(n)$-Barrier in the Construction of Compressed Suffix   Arrays and Suffix Trees

Dominik Kempa; Tomasz Kociumaka

arXiv:2106.12725·cs.DS·April 20, 2023

Breaking the $O(n)$-Barrier in the Construction of Compressed Suffix Arrays and Suffix Trees

Dominik Kempa, Tomasz Kociumaka

PDF

TL;DR

This paper introduces a novel construction of compressed suffix arrays and suffix trees that can be built faster than previous methods, using less than linear time, while maintaining space efficiency and query performance.

Contribution

It presents the first in 20 years construction algorithms for compressed suffix structures with sublinear time complexity, matching space and query bounds of existing structures.

Findings

01

Supports suffix array queries in O(log^ε n) time

02

Supports full suffix tree functionality in O(log^ε n) time

03

Construction in O(n min(1, log σ / sqrt(log n))) time

Abstract

The suffix array and the suffix tree are the two most fundamental data structures for string processing. For a length- $n$ text, however, they use $Θ (n lo g n)$ bits of space, which is often too costly. To address this, Grossi and Vitter [STOC 2000] and, independently, Ferragina and Manzini [FOCS 2000] introduced space-efficient versions of the suffix array, known as the compressed suffix array (CSA) and the FM-index. Sadakane [SODA 2002] then showed how to augment them to obtain the compressed suffix tree (CST). For a length- $n$ text over an alphabet of size $σ$ , these structures use only $O (n lo g σ)$ bits. The biggest remaining open question is how efficiently they can be constructed. After two decades, the fastest algorithms still run in $O (n)$ time [Hon et al., FOCS 2003], which is $Θ (lo g_{σ} n)$ factor away from the lower bound of $Ω (n / lo g_{σ} n)$ .…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.