A really simple approximation of smallest grammar
Artur Je\.z

TL;DR
This paper introduces a simple, linear-time algorithm that constructs a near-optimal small grammar for a string, improving efficiency and simplicity over previous complex methods.
Contribution
The paper presents a straightforward linear-time algorithm for approximating the smallest grammar, using LZ77 factorization and phased pair replacements, with a clear analysis of its approximation bounds.
Findings
Constructs a grammar of size O(g log(N/g))
Runs in linear time assuming alphabet is numerically identifiable
Uses LZ77 factorization and phased pair replacements
Abstract
In this paper we present a really simple linear-time algorithm constructing a context-free grammar of size O(g log (N/g)) for the input string, where N is the size of the input string and g the size of the optimal grammar generating this string. The algorithm works for arbitrary size alphabets, but the running time is linear assuming that the alphabet Sigma of the input string can be identified with numbers from 1,ldots, N^c for some constant c. Algorithms with such an approximation guarantee and running time are known, however all of them were non-trivial and their analyses were involved. The here presented algorithm computes the LZ77 factorisation and transforms it in phases to a grammar. In each phase it maintains an LZ77-like factorisation of the word with at most l factors as well as additional O(l) letters, where l was the size of the original LZ77 factorisation. In one phase in a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · semigroups and automata theory · DNA and Biological Computing
