A survey of BWT variants for string collections

Davide Cenzato; Zsuzsanna Lipt\'ak

arXiv:2202.13235·cs.DS·May 28, 2025

A survey of BWT variants for string collections

Davide Cenzato, Zsuzsanna Lipt\'ak

PDF

1 Repo

TL;DR

This survey examines various BWT variants used for string collections in bioinformatics, highlighting their differences in theory and practice, and analyzing how these differences impact biological data processing.

Contribution

The paper systematically reviews 18 tools, identifies six BWT variants, and compares their theoretical and practical differences across multiple biological datasets.

Findings

01

Significant differences exist between BWT variants, especially on similar short sequences.

02

The number of BWT runs varies up to 4.2 times across variants.

03

Input order can affect the BWT output for many tools.

Abstract

In recent years, the focus of bioinformatics research has moved from individual sequences to collections of sequences. Given the fundamental role of the Burrows-Wheeler Transform (BWT) in string processing, a number of dedicated tools have been developed for computing the BWT of string collections. While the focus has been on improving efficiency, both in space and time, the exact definition of the BWT employed has not been at the center of attention. As we show in this paper, the different tools in use often compute non-equivalent BWT variants: the resulting transforms can differ from each other significantly, including the number $r$ of runs, a central parameter of the BWT. Moreover, with many tools, the transform depends on the input order of the collection. In other words, on the same dataset, the same tool may output different transforms if the dataset is given in a different…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

davidecenzato/bwt-variants-for-string-collections
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.