A New Lightweight Algorithm to compute the BWT and the LCP array of a Set of Strings
Paola Bonizzoni, Gianluca Della Vedova, Serena Nicosia, Marco, Previtali, Raffaella Rizzi

TL;DR
This paper introduces a new lightweight algorithm for efficiently computing the BWT and LCP array for large sets of strings, significantly reducing I/O complexity compared to previous methods, especially in external memory scenarios.
Contribution
The paper presents a novel algorithm that computes BWT and LCP arrays simultaneously with lower I/O complexity, improving efficiency for large string collections.
Findings
Reduces I/O complexity to O(kmL(log k + log σ))
Efficiently handles large string datasets in external memory
Improves over previous lightweight approaches in bioinformatics applications
Abstract
Indexing of very large collections of strings such as those produced by the widespread sequencing technologies, heavily relies on multi-string generalizations of the Burrows-Wheeler Transform (BWT), and for this problem various in-memory algorithms have been proposed. The rapid growing of data that are processed routinely, such as in bioinformatics, requires a large amount of main memory, and this fact has motivated the development of algorithms, to compute the BWT, that work almost entirely in external memory. On the other hand, the related problem of computing the Longest Common Prefix (LCP) array is often instrumental in several algorithms on collection of strings, such as those that compute the suffix-prefix overlap among strings, which is an essential step for many genome assembly algorithms. The best current lightweight approach to compute BWT and LCP array on a set of …
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · DNA and Biological Computing · Cellular Automata and Applications
