Counting on General Run-Length Grammars

Gonzalo Navarro; Alejandro Pacheco

arXiv:2406.00221·cs.DS·January 30, 2025

Counting on General Run-Length Grammars

Gonzalo Navarro, Alejandro Pacheco

PDF

Open Access

TL;DR

This paper presents a novel data structure that efficiently counts pattern occurrences in texts compressed with run-length grammars, achieving near-optimal time complexity and solving an open problem in compressed pattern matching.

Contribution

It introduces the first data structure for counting pattern occurrences in run-length grammar compressed texts with space proportional to grammar size and efficient query time.

Findings

01

Achieves pattern counting in O(m log^{2+ε} n) time.

02

Uses space proportional to grammar size.

03

Solves an open problem in compressed pattern matching.

Abstract

We introduce a data structure for counting pattern occurrences in texts compressed with any run-length context-free grammar. Our structure uses space proportional to the grammar size and counts the occurrences of a pattern of length $m$ in a text of length $n$ in time $O(m\log^{2+\epsilon} n)$, for any constant $\epsilon > 0$ chosen at indexing time. This is the first solution to an open problem posed by Christiansen et al.~[ACM TALG 2020] and enhances our abilities for computation over compressed data; we give an example application.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech and dialogue systems · DNA and Biological Computing