Sensitivity of Repetitiveness Measures to String Reversal
Hideo Bannai, Yuto Fujie, Peaker Guo, Shunsuke Inenaga, Yuto Nakashima, Simon J. Puglisi, Cristian Urbina

TL;DR
This paper investigates how string reversal affects various measures of repetitiveness, revealing significant sensitivities and limitations of these measures in practical applications.
Contribution
It provides tight bounds on the sensitivity of repetitiveness measures like BWT runs and LZ parsing to string reversal, extending understanding of their limitations.
Findings
Reversal can increase BWT run count by A(n)
The ratio of LZ parsing sizes approaches 3 for reversed strings
Sensitivity bounds for lexicographic parsing size are established
Abstract
We study the impact that string reversal can have on several repetitiveness measures. First, we exhibit an infinite family of strings where the number, , of runs in the run-length encoding of the Burrows--Wheeler transform (BWT) can increase additively by when reversing the string. This substantially improves the known lower-bound for the additive sensitivity of and it is asymptotically tight. We generalize our result to other variants of the BWT, including the variant with an appended end-of-string symbol and the bijective BWT. We show that an analogous result holds for the size of the Lempel--Ziv 77 (LZ) parsing of the text, and also for some of its variants, including the non-overlapping LZ parsing, and the LZ-end parsing. Moreover, we describe a family of strings for which the ratio approaches from below as $|w|\rightarrow…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Natural Language Processing Techniques · Machine Learning and Algorithms
