Local Decodability of the Burrows-Wheeler Transform
Sandip Sinha, Omri Weinstein

TL;DR
This paper improves the efficiency of locally decoding substrings from the Burrows-Wheeler Transform (BWT), reducing redundancy and decoding time, with significant implications for compressed text indexing and data structures.
Contribution
It introduces a near-quadratic reduction in redundancy for local decoding of BWT and designs a locally-decodable Move-to-Front code, advancing compressed data structures.
Findings
Reduces redundancy from ilde{O}(n/\sqrt{t}) to ilde{O}(n ext{log}(t)/t)
Provides an exponential improvement in pattern-matching redundancy
Establishes a lower bound of ext{ extOmega}(n/t^2) for local decoding data structures
Abstract
The Burrows-Wheeler Transform (BWT) is among the most influential discoveries in text compression and DNA storage. It is a reversible preprocessing step that rearranges an -letter string into runs of identical characters (by exploiting context regularities), resulting in highly compressible strings, and is the basis of the \texttt{bzip} compression program. Alas, the decoding process of BWT is inherently sequential and requires time even to retrieve a \emph{single} character. We study the succinct data structure problem of locally decoding short substrings of a given text under its \emph{compressed} BWT, i.e., with small additive redundancy over the \emph{Move-To-Front} (\texttt{bzip}) compression. The celebrated BWT-based FM-index (FOCS '00), as well as other related literature, yield a trade-off of bits, when a single character is to be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · DNA and Biological Computing · Cellular Automata and Applications
