Resolution of the Burrows-Wheeler Transform Conjecture
Dominik Kempa, Tomasz Kociumaka

TL;DR
This paper proves that the number of equal-letter runs in the Burrows-Wheeler Transform is bounded by the LZ77 compression size times a polylogarithmic factor, linking BWT compression to LZ77 and advancing understanding of text compression.
Contribution
It establishes a non-trivial bound on BWT runs in terms of LZ77 size, and introduces new algorithms and data structures for converting LZ77 parsing to BWT.
Findings
r = O(z log^2 n) for all texts
Enables suffix tree functionality in O(z polylog n) space
Provides an efficient algorithm to convert LZ77 parsing to BWT
Abstract
The Burrows-Wheeler Transform (BWT) is an invertible text transformation that permutes symbols of a text according to the lexicographical order of its suffixes. BWT is the main component of popular lossless compression programs (such as bzip2) as well as recent powerful compressed indexes (such as -index [Gagie et al., J. ACM, 2020]), central in modern bioinformatics. The compression ratio of BWT is quantified by the number of equal-letter runs. Despite the practical significance of BWT, no non-trivial bound on the value of is known. This is in contrast to nearly all other known compression methods, whose sizes have been shown to be either always within a factor (where is the length of text) from , the size of Lempel-Ziv (LZ77) parsing of the text, or significantly larger in the worst case (by a factor for ). In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
