Grammar Compressed Sequences with Rank/Select Support
Alberto Ord\'o\~nez, Gonzalo Navarro, Nieves R. Brisaboa

TL;DR
This paper introduces grammar-based compressed sequence representations that efficiently support access, rank, and select operations, outperforming statistical methods in space and speed, especially for highly repetitive data.
Contribution
The authors present a novel grammar-based compression method for repetitive sequences that significantly reduces space and maintains fast query support.
Findings
Uses up to 6% of space compared to statistical compression.
Supports access and rank/select operations within tens of microseconds.
Effective in text indexing applications.
Abstract
Sequence representations supporting not only direct access to their symbols, but also rank/select operations, are a fundamental building block in many compressed data structures. Several recent applications need to represent highly repetitive sequences, and classical statistical compression proves ineffective. We introduce, instead, grammar-based representations for repetitive sequences, which use up to 6% of the space needed by statistically compressed representations, and support direct access and rank/select operations within tens of microseconds. We demonstrate the impact of our structures in text indexing applications.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
