Improved Grammar-Based Compressed Indexes
Francisco Claude, Gonzalo Navarro

TL;DR
This paper presents a novel grammar-based compressed index that enables efficient pattern searching and substring extraction in compressed texts, with search times logarithmic in the grammar size and space close to the text size.
Contribution
It introduces the first grammar-compressed index supporting searches with time complexity logarithmic in the grammar size, improving efficiency over previous methods.
Findings
Supports pattern search in O((m^2/ε) log (log u / log n) + occ log n) time.
Uses space close to the size of the grammar representation, N log u bits.
Enables substring extraction in time proportional to substring length.
Abstract
We introduce the first grammar-compressed representation of a sequence that supports searches in time that depends only logarithmically on the size of the grammar. Given a text that is represented by a (context-free) grammar of (terminal and nonterminal) symbols and size (measured as the sum of the lengths of the right hands of the rules), a basic grammar-based representation of takes bits of space. Our representation requires bits of space, for any . It can find the positions of the occurrences of a pattern of length in in time, and extract any substring of length of in time , where is the height of the grammar tree.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · DNA and Biological Computing · Cellular Automata and Applications
