
TL;DR
This paper introduces a space-efficient indexing method for texts based on the r*-index, enabling fast pattern matching and occurrence reporting using space proportional to the combined complexity measures of runs in the BWT and LZ77 phrases.
Contribution
The authors present a novel indexing structure that combines run-length and LZ77 measures to achieve efficient pattern search within compressed space.
Findings
Index size is $O(r^* ext{log}(n/r^*) + z ext{log} n)$ bits.
Pattern occurrences can be reported in $O(m ext{log} n + occ ext{log}^ ext{ε} n)$ time.
Supports locating the leftmost and rightmost pattern occurrences efficiently.
Abstract
Let be a text over an alphabet of size , let be the sum of the numbers of runs in the Burrows-Wheeler Transforms of and its reverse, and let be the number of phrases in the LZ77 parse of . We show how to store in bits such that, given a pattern , we can report the locations of the occurrences of in in time. We can also report the position of the leftmost and rightmost occurrences of in in the same space and time.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Data Management and Algorithms · Advanced Database Systems and Queries
