RePair in Compressed Space and Time
Kensuke Sakai, Tatsuya Ohno, Keisuke Goto, Yoshimasa Takabatake,, Tomohiro I, Hiroshi Sakamoto

TL;DR
This paper introduces a novel RePair grammar compression algorithm that operates efficiently in compressed space, reducing memory usage and enabling processing of large-scale texts while maintaining good compression ratios.
Contribution
It presents the first RePair algorithm capable of working in compressed space, restructuring grammars efficiently, and reducing peak memory usage compared to existing methods.
Findings
Algorithm runs in compressed space for highly compressible texts.
Achieves expected $O( ext{min}(N, nm ext{log} N) m)$ time.
Outperforms existing space-efficient RePair implementations.
Abstract
Given a string of length , the goal of grammar compression is to construct a small context-free grammar generating only . Among existing grammar compression methods, RePair (recursive paring) [Larsson and Moffat, 1999] is notable for achieving good compression ratios in practice. Although the original paper already achieved a time-optimal algorithm to compute the RePair grammar RePair() in expected time, the study to reduce its working space is still active so that it is applicable to large-scale data. In this paper, we propose the first RePair algorithm working in compressed space, i.e., potentially space for highly compressible texts. The key idea is to give a new way to restructure an arbitrary grammar for into RePair() in compressed space and time. Based on the recompression technique, we propose an algorithm for RePair() in $O(\min(N, nm…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Natural Language Processing Techniques · semigroups and automata theory
