Breaking the $O(n)$-Barrier in the Construction of Compressed Suffix Arrays and Suffix Trees
Dominik Kempa, Tomasz Kociumaka

TL;DR
This paper introduces a novel construction of compressed suffix arrays and suffix trees that can be built faster than previous methods, using less than linear time, while maintaining space efficiency and query performance.
Contribution
It presents the first in 20 years construction algorithms for compressed suffix structures with sublinear time complexity, matching space and query bounds of existing structures.
Findings
Supports suffix array queries in O(log^ε n) time
Supports full suffix tree functionality in O(log^ε n) time
Construction in O(n min(1, log σ / sqrt(log n))) time
Abstract
The suffix array and the suffix tree are the two most fundamental data structures for string processing. For a length- text, however, they use bits of space, which is often too costly. To address this, Grossi and Vitter [STOC 2000] and, independently, Ferragina and Manzini [FOCS 2000] introduced space-efficient versions of the suffix array, known as the compressed suffix array (CSA) and the FM-index. Sadakane [SODA 2002] then showed how to augment them to obtain the compressed suffix tree (CST). For a length- text over an alphabet of size , these structures use only bits. The biggest remaining open question is how efficiently they can be constructed. After two decades, the fastest algorithms still run in time [Hon et al., FOCS 2003], which is factor away from the lower bound of .…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
