On the compressiveness of the Burrows-Wheeler transform
Hideo Bannai, Tomohiro I, Yuto Nakashima

TL;DR
This paper investigates the compressiveness of the Burrows-Wheeler transform (BWT) and its bijective variant (BBWT), revealing their ability to preserve or enhance string compressibility beyond traditional dictionary compression measures.
Contribution
It extends previous results on BWT and BBWT size relations, introduces new measures for clustering effects, and demonstrates BBWT's potential to improve compressibility beyond dictionary compression limits.
Findings
BWT and BBWT do not significantly increase string repetitiveness across various measures.
There exist strings that are incompressible by dictionary methods but become highly compressible after BBWT.
Applying BBWT can sometimes surpass the compression limits of dictionary-based methods.
Abstract
The Burrows-Wheeler transform (BWT) is a reversible transform that converts a string into another string . The size of the run-length encoded BWT (RLBWT) can be interpreted as a measure of repetitiveness in the class of representations called dictionary compression which are essentially representations based on copy and paste operations. In this paper, we shed new light on the compressiveness of BWT and the bijective BWT (BBWT). We first extend previous results on the relations of their run-length compressed sizes and . We also show that the so-called ``clustering effect'' of BWT and BBWT can be captured by measures other than empirical entropy or run-length encoding. In particular, we show that BWT and BBWT do not increase the repetitiveness of the string with respect to various measures based on dictionary compression by more than a polylogarithmic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Numerical Analysis Techniques
