# Computing the BWT and LCP array of a Set of Strings in External Memory

**Authors:** Paola Bonizzoni, Gianluca Della Vedova, Yuri Pirola, Marco, Previtali, Raffaella Rizzi

arXiv: 1705.07756 · 2020-12-07

## TL;DR

This paper introduces an external memory algorithm for efficiently computing the BWT and LCP array of large string collections, crucial for genome assembly, using a novel backward approach that reduces memory and I/O requirements.

## Contribution

It presents a new external memory algorithm employing a backward strategy to compute BWT and LCP arrays simultaneously for large string sets, improving efficiency over previous in-memory methods.

## Key findings

- Algorithm runs in O(mkl) time and I/O volume for constant-length strings.
- Uses only O(k + m) main memory, suitable for large datasets.
- Effective for genome assembly and large-scale string indexing.

## Abstract

Indexing very large collections of strings, such as those produced by the widespread next generation sequencing technologies, heavily relies on multistring generalization of the Burrows-Wheeler Transform (BWT): large requirements of in-memory approaches have stimulated recent developments on external memory algorithms. The related problem of computing the Longest Common Prefix (LCP) array of a set of strings is instrumental to compute the suffix-prefix overlaps among strings, which is an essential step for many genome assembly algorithms. In a previous paper, we presented an in-memory divide-and-conquer method for building the BWT and LCP where we merge partial BWTs with a forward approach to sort suffixes. In this paper, we propose an alternative backward strategy to develop an external memory method to simultaneously build the BWT and the LCP array on a collection of m strings of different lengths. The algorithm over a set of strings having constant length k has O(mkl) time and I/O volume, using O(k + m) main memory, where l is the maximum value in the LCP array.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1705.07756/full.md

## Figures

20 figures with captions in the complete paper: https://tomesphere.com/paper/1705.07756/full.md

## References

48 references — full list in the complete paper: https://tomesphere.com/paper/1705.07756/full.md

---
Source: https://tomesphere.com/paper/1705.07756