# Space-Efficient Construction of Compressed Suffix Trees

**Authors:** Nicola Prezza, Giovanna Rosone

arXiv: 1908.04686 · 2019-08-14

## TL;DR

This paper introduces space-efficient algorithms for constructing compressed suffix trees directly from the Burrows-Wheeler transform, significantly reducing working space and improving on previous methods in both space and time efficiency.

## Contribution

The authors develop novel algorithms that build compressed suffix trees from the BWT using minimal working space, surpassing prior space and time efficiency benchmarks.

## Key findings

- Algorithms enumerate LCP values and suffix tree intervals in $O(n\log\sigma)$ time.
- Construction of suffix tree topology uses $o(n\log\sigma)$ bits of working space.
- Implementation processes data at up to 2.92 megabases per second with minimal RAM usage.

## Abstract

We show how to build several data structures of central importance to string processing, taking as input the Burrows-Wheeler transform (BWT) and using small extra working space. Let $n$ be the text length and $\sigma$ be the alphabet size. We first provide two algorithms that enumerate all LCP values and suffix tree intervals in $O(n\log\sigma)$ time using just $o(n\log\sigma)$ bits of working space on top of the input BWT. Using these algorithms as building blocks, for any parameter $0 < \epsilon \leq 1$ we show how to build the PLCP bitvector and the balanced parentheses representation of the suffix tree topology in $O\left(n(\log\sigma + \epsilon^{-1}\cdot \log\log n)\right)$ time using at most $n\log\sigma \cdot(\epsilon + o(1))$ bits of working space on top of the input BWT and the output. In particular, this implies that we can build a compressed suffix tree from the BWT using just succinct working space (i.e. $o(n\log\sigma)$ bits) and any time in $\Theta(n\log\sigma) + \omega(n\log\log n)$. This improves the previous most space-efficient algorithms, which worked in $O(n)$ bits and $O(n\log n)$ time. We also consider the problem of merging BWTs of string collections, and provide a solution running in $O(n\log\sigma)$ time and using just $o(n\log\sigma)$ bits of working space. An efficient implementation of our LCP construction and BWT merge algorithms use (in RAM) as few as $n$ bits on top of a packed representation of the input/output and process data as fast as $2.92$ megabases per second.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1908.04686/full.md

## References

44 references — full list in the complete paper: https://tomesphere.com/paper/1908.04686/full.md

---
Source: https://tomesphere.com/paper/1908.04686