Incongruity-sensitive access to highly compressed strings
Ferdinando Cicalese, Zsuzsanna Lipt\'ak, Travis Gagie, Gonzalo Navarro, Nicola Prezza, Cristian Urbina

TL;DR
This paper introduces a novel access method for highly compressed strings that exploits character incongruity to achieve faster random access times, especially for less compressible substrings, using specialized data structures.
Contribution
It presents new data structures supporting faster random access in highly compressed strings by leveraging character incongruity and phrase source overlap properties.
Findings
Access time depends on the longest repeated substring or phrase overlaps.
Supports efficient access in run-length compressed straight-line programs and block trees.
Faster access is achieved for characters that are incongruous or less compressible.
Abstract
Random access to highly compressed strings -- represented by straight-line programs or Lempel-Ziv parses, for example -- is a well-studied topic. Random access to such strings in strongly sublogarithmic time is impossible in the worst case, but previous authors have shown how to support faster access to specific characters and their neighbourhoods. In this paper we explore whether, since better compression can impede access, we can support faster access to relatively incompressible substrings of highly compressed strings. We first show how, given a run-length compressed straight-line program (RLSLP) of size or a block tree of size , we can build an -space or an -space data structure, respectively, that supports access to any character in time logarithmic in the length of the longest repeated substring containing that character. That is, the more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · semigroups and automata theory · Machine Learning and Algorithms
