A Grammar Compression Algorithm based on Induced Suffix Sorting
Daniel Saad Nogueira Nunes, Felipe A. Louza, Simon Gog, Mauricio, Ayala-Rinc\'on, Gonzalo Navarro

TL;DR
This paper presents GCIS, a novel grammar compression algorithm leveraging induced suffix sorting, which effectively compresses repetitive strings and competes with established tools in terms of ratio and speed.
Contribution
The paper introduces GCIS, a grammar compression method based on SAIS, combining suffix sorting with redundancy exploitation for improved compression of repetitive data.
Findings
GCIS achieves competitive compression ratios.
GCIS is efficient in compression and decompression times.
Performs well on highly repetitive strings.
Abstract
We introduce GCIS, a grammar compression algorithm based on the induced suffix sorting algorithm SAIS, introduced by Nong et al. in 2009. Our solution builds on the factorization performed by SAIS during suffix sorting. We construct a context-free grammar on the input string which can be further reduced into a shorter string by substituting each substring by its correspondent factor. The resulting grammar is encoded by exploring some redundancies, such as common prefixes between suffix rules, which are sorted according to SAIS framework. When compared to well-known compression tools such as Re-Pair and 7-zip, our algorithm is competitive and very effective at handling repetitive string regarding compression ratio, compression and decompression running time.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
