Efficient LZ78 factorization of grammar compressed text

Hideo Bannai; Shunsuke Inenaga; Masayuki Takeda

arXiv:1207.4607·cs.DS·May 27, 2013

Efficient LZ78 factorization of grammar compressed text

Hideo Bannai, Shunsuke Inenaga, Masayuki Takeda

PDF

TL;DR

This paper introduces an efficient algorithm for computing the LZ78 factorization directly from a grammar-compressed text represented as an SLP, significantly improving performance on compressible data.

Contribution

The authors develop a novel algorithm that computes LZ78 factorization from an SLP in sublinear time relative to the uncompressed size, with improvements based on the text's redundancy.

Findings

01

Algorithm runs in $O(n ext{L} + m ext{log}N)$ time for certain conditions.

02

Performance improves with higher redundancy and smaller SLP size.

03

Approaches linear time for highly compressible texts.

Abstract

We present an efficient algorithm for computing the LZ78 factorization of a text, where the text is represented as a straight line program (SLP), which is a context free grammar in the Chomsky normal form that generates a single string. Given an SLP of size $n$ representing a text $S$ of length $N$ , our algorithm computes the LZ78 factorization of $T$ in $O (n N + m lo g N)$ time and $O (n N + m)$ space, where $m$ is the number of resulting LZ78 factors. We also show how to improve the algorithm so that the $n N$ term in the time and space complexities becomes either $n L$ , where $L$ is the length of the longest LZ78 factor, or $(N - α)$ where $α \geq 0$ is a quantity which depends on the amount of redundancy that the SLP captures with respect to substrings of $S$ of a certain length. Since $m = O (N / lo g_{σ} N)$ where $σ$ is the alphabet size, the latter is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.