A New Class of String Transformations for Compressed Text Indexing
Raffaele Giancarlo, Giovanni Manzini, Antonio Restivo and, Giovanna Rosone, Marinella Sciortino

TL;DR
This paper introduces a new class of string transformations, called local orderings-based transformations, which share the key properties of the Burrows-Wheeler Transform and are useful for compressed text indexing, especially in repetitive data collections.
Contribution
It presents a novel family of string transformations that generalize BWT and ABWT, with applications to efficient compressed indexing and pattern search.
Findings
The new transformations support pattern search and can be used to build the r-index.
They include BWT and ABWT as special cases.
An algorithm is provided to find the BWT variant minimizing runs in linear time.
Abstract
Introduced about thirty years ago in the field of Data Compression, the Burrows-Wheeler Transform (BWT) is a string transformation that, besides being a booster of the performance of memoryless compressors, plays a fundamental role in the design of efficient self-indexing compressed data structures. Finding other string transformations with the same remarkable properties of BWT has been a challenge for many researchers for a long time. Among the known BWT variants, the only one that has been recently shown to be a valid alternative to BWT is the Alternating BWT (ABWT), another invertible string transformation introduced about ten years ago in connection with a generalization of Lyndon words. In this paper, we introduce a whole class of new string transformations, called local orderings-based transformations, which have all the myriad virtues of BWT. We show that this new family is a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Network Packet Processing and Optimization · Natural Language Processing Techniques
