TL;DR
This paper introduces an external memory algorithm for computing the BWT and LCP array for sequence collections, optimizing memory use and outperforming existing methods, with applications in bioinformatics tasks.
Contribution
It presents a novel external memory algorithm that efficiently computes BWT and LCP arrays, and extends to solve key bioinformatics problems with minimal memory usage.
Findings
Algorithm performs O(n AveLcp) I/Os, better than previous methods.
Outperforms current algorithms on collections with small AveLcp.
Enables external memory solutions for bioinformatics problems like repeats and de Bruijn graphs.
Abstract
We propose an external memory algorithm for the computation of the BWT and LCP array for a collection of sequences. Our algorithm takes the amount of available memory as an input parameter, and tries to make the best use of it by splitting the input collection into subcollections sufficiently small that it can compute their BWT in RAM using an optimal linear time algorithm. Next, it merges the partial BWTs in external memory and in the process it also computes the LCP values. We prove that our algorithm performs O(n AveLcp) sequential I/Os, where n is the total length of the collection, and AveLcp is the average Longest Common Prefix of the collection. This bound is an improvement over the known algorithms for the same task. The experimental results show that our algorithm outperforms the current best algorithm for collections of sequences with different lengths and for collections with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
