Practical Repetition-Aware Grammar Compression
Isamu Furuya

TL;DR
This paper introduces a practical encoding scheme for MR-RePair, an improved grammar compression method, and extends it to run-length grammars, demonstrating effective compression on repetitive datasets.
Contribution
It proposes a new encoding method for MR-RePair and a novel run-length variant called RL-MR-RePair, enhancing compression of repetitive data.
Findings
The encoding method improves compression efficiency.
RL-MR-RePair outperforms existing methods on real datasets.
The approach is practical and effective for repetitive text compression.
Abstract
The goal of grammar compression is to construct a small sized context free grammar which uniquely generates the input text data. Among grammar compression methods, RePair is known for its good practical compression performance. MR-RePair was recently proposed as an improvement to RePair for constructing small-sized context free grammar for repetitive text data. However, a compact encoding scheme has not been discussed for MR-RePair. We propose a practical encoding method for MR-RePair and show its effectiveness through comparative experiments. Moreover, we extend MR-RePair to run-length context free grammar and design a novel variant for it called RL-MR-RePair. We experimentally demonstrate that a compression scheme consisting of RL-MR-RePair and the proposed encoding method show good performance on real repetitive datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Music and Audio Processing · Video Analysis and Summarization
