# Re-Pair In Small Space

**Authors:** Dominik K\"oppl, Tomohiro I, Isamu Furuya, Yoshimasa, Takabatake, Kensuke Sakai, Keisuke Goto

arXiv: 1908.04933 · 2019-11-19

## TL;DR

This paper introduces a space-efficient algorithm for computing Re-Pair grammar compression on large datasets, reducing memory usage while maintaining effective compression rates.

## Contribution

It presents a novel algorithm that computes Re-Pair in significantly less space, supporting large-scale data processing and recovery of original input.

## Key findings

- Achieves Re-Pair computation in near-quadratic time with reduced space complexity.
- Supports recovery of original text within the same time as computation.
- Provides variants for parallel and external memory models.

## Abstract

Re-Pair is a grammar compression scheme with favorably good compression rates. The computation of Re-Pair comes with the cost of maintaining large frequency tables, which makes it hard to compute Re-Pair on large scale data sets. As a solution for this problem we present, given a text of length $n$ whose characters are drawn from an integer alphabet, an $O(n^2) \cap O(n^2 \lg \log_\tau n \lg \lg \lg n / \log_\tau n)$ time algorithm computing Re-Pair in $n \lg \max(n,\tau)$ bits of space including the text space, where $\tau$ is the number of terminals and non-terminals. The algorithm works in the restore model, supporting the recovery of the original input in the time for the Re-Pair computation with $O(\lg n)$ additional bits of working space. We give variants of our solution working in parallel or in the external memory model.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1908.04933/full.md

## Figures

16 figures with captions in the complete paper: https://tomesphere.com/paper/1908.04933/full.md

## References

34 references — full list in the complete paper: https://tomesphere.com/paper/1908.04933/full.md

---
Source: https://tomesphere.com/paper/1908.04933