From H&M to Gap for Lightweight BWT Merging
Giovanni Manzini

TL;DR
This paper introduces an improved algorithm called Gap that extends the H&M method to merge both BWTs and LCP arrays efficiently, with minimal additional space, enhancing data merging capabilities in bioinformatics.
Contribution
The paper presents the Gap algorithm, which efficiently merges BWTs and LCP arrays simultaneously, improving upon the previous H&M method with minimal extra space.
Findings
The Gap algorithm has the same asymptotic complexity as H&M.
It can merge BWTs and LCP arrays simultaneously.
Requires only additional space for LCP storage.
Abstract
Recently, Holt and McMillan [Bionformatics 2014, ACM-BCB 2014] have proposed a simple and elegant algorithm to merge the Burrows-Wheeler transforms of a family of strings. In this paper we show that the H&M algorithm can be improved so that, in addition to merging the BWTs, it can also merge the Longest Common Prefix (LCP) arrays. The new algorithm, called Gap because of how it operates, has the same asymptotic cost as the H&M algorithm and requires additional space only for storing the LCP values.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Platforms and Economics
