Random Access to Grammar Compressed Strings

Philip Bille; Gad M. Landau; Rajeev Raman; Kunihiko Sadakane,; Srinivasa Rao Satti; Oren Weimann

arXiv:1001.1565·cs.DS·October 30, 2013

Random Access to Grammar Compressed Strings

Philip Bille, Gad M. Landau, Rajeev Raman, Kunihiko Sadakane,, Srinivasa Rao Satti, Oren Weimann

PDF

TL;DR

This paper introduces a novel grammar-based compression method enabling efficient random access and substring decompression, significantly improving performance for operations on compressed strings and trees.

Contribution

The paper presents a new grammar representation that allows fast random access and substring decompression, along with algorithms for approximate string matching and tree navigation on compressed data.

Findings

01

Achieves O(log N) random access time in grammar-compressed strings.

02

Supports efficient substring decompression with similar complexity to random access.

03

Provides improved algorithms for approximate string matching on compressed strings.

Abstract

Grammar based compression, where one replaces a long string by a small context-free grammar that generates the string, is a simple and powerful paradigm that captures many popular compression schemes. In this paper, we present a novel grammar representation that allows efficient random access to any character or substring without decompressing the string. Let $S$ be a string of length $N$ compressed into a context-free grammar $S$ of size $n$ . We present two representations of $S$ achieving $O (lo g N)$ random access time, and either $O (n \cdot α_{k} (n))$ construction time and space on the pointer machine model, or $O (n)$ construction time and space on the RAM. Here, $α_{k} (n)$ is the inverse of the $k^{t h}$ row of Ackermann's function. Our representations also efficiently support decompression of any substring in $S$ : we can decompress any substring of length…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.