Counting on General Run-Length Grammars
Gonzalo Navarro, Alejandro Pacheco

TL;DR
This paper presents a novel data structure that efficiently counts pattern occurrences in texts compressed with run-length grammars, achieving near-optimal time complexity and solving an open problem in compressed pattern matching.
Contribution
It introduces the first data structure for counting pattern occurrences in run-length grammar compressed texts with space proportional to grammar size and efficient query time.
Findings
Achieves pattern counting in O(m log^{2+ε} n) time.
Uses space proportional to grammar size.
Solves an open problem in compressed pattern matching.
Abstract
We introduce a data structure for counting pattern occurrences in texts compressed with any run-length context-free grammar. Our structure uses space proportional to the grammar size and counts the occurrences of a pattern of length in a text of length in time \(O(m\log^{2+\epsilon} n)\), for any constant \(\epsilon > 0\) chosen at indexing time. This is the first solution to an open problem posed by Christiansen et al.~[ACM TALG 2020] and enhances our abilities for computation over compressed data; we give an example application.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech and dialogue systems · DNA and Biological Computing
