A really simple approximation of smallest grammar

Artur Je\.z

arXiv:1403.4445·cs.DS·March 19, 2014·1 cites

A really simple approximation of smallest grammar

Artur Je\.z

PDF

Open Access

TL;DR

This paper introduces a simple, linear-time algorithm that constructs a near-optimal small grammar for a string, improving efficiency and simplicity over previous complex methods.

Contribution

The paper presents a straightforward linear-time algorithm for approximating the smallest grammar, using LZ77 factorization and phased pair replacements, with a clear analysis of its approximation bounds.

Findings

01

Constructs a grammar of size O(g log(N/g))

02

Runs in linear time assuming alphabet is numerically identifiable

03

Uses LZ77 factorization and phased pair replacements

Abstract

In this paper we present a really simple linear-time algorithm constructing a context-free grammar of size O(g log (N/g)) for the input string, where N is the size of the input string and g the size of the optimal grammar generating this string. The algorithm works for arbitrary size alphabets, but the running time is linear assuming that the alphabet Sigma of the input string can be identified with numbers from 1,ldots, N^c for some constant c. Algorithms with such an approximation guarantee and running time are known, however all of them were non-trivial and their analyses were involved. The here presented algorithm computes the LZ77 factorisation and transforms it in phases to a grammar. In each phase it maintains an LZ77-like factorisation of the word with at most l factors as well as additional O(l) letters, where l was the size of the original LZ77 factorisation. In one phase in a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · semigroups and automata theory · DNA and Biological Computing