Space-efficient SLP encoding for $O(\log N)$-time random access

Akito Takasaka; Tomohiro I

arXiv:2406.15011·cs.DS·January 9, 2026

Space-efficient SLP encoding for $O(\log N)$-time random access

Akito Takasaka, Tomohiro I

PDF

Open Access

TL;DR

This paper introduces space-efficient encodings of SLPs that enable near-optimal $O(rac{ ext{log} N}{q-p})$ time random access to substrings, improving string compression and retrieval.

Contribution

It presents novel encoding schemes for SLPs that support fast random access with near-optimal space complexity, extending previous work on compressed string representations.

Findings

01

Supports substring extraction in $O( ext{log} N + q - p)$ time.

02

Uses space close to the information-theoretic lower bound.

03

Provides multiple encoding variants with different space-time trade-offs.

Abstract

A Straight-Line Program (SLP) $G$ for a string $T$ is a context-free grammar (CFG) that derives $T$ only, which can be considered as a compressed representation of $T$ . In this paper, we show how to encode $G$ in $n ⌈ l g N ⌉ + (n + n^{'}) ⌈ l g (n + σ)⌉ + 4 n - 2 n^{'} + o (n)$ bits to support random access queries of extracting $T [p .. q]$ in worst-case $O (lo g N + q - p)$ time, where $N$ is the length of $T$ , $σ$ is the alphabet size, $n$ is the number of variables in $G$ and $n^{'} \leq n$ is the number of symmetric centroid paths in the DAG representation for $G$ . The time complexity is almost optimal because Verbin and Yu [CPM 2013] proved that $O (lo g N)$ term cannot be significantly improved in general with $poly (n)$ -space data structures. We also present alternative encodings that achieve the same random access time with $n \lceil \lg N \rceil + n \lceil…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · DNA and Biological Computing · Error Correcting Code Techniques