
TL;DR
This paper introduces a self-index built on the Block Tree data structure, enabling efficient pattern searches with space close to the original compressed data, based on Lempel-Ziv parsing.
Contribution
It presents a novel self-index on Block Trees that supports pattern search efficiently while maintaining space proportional to the data's compression.
Findings
Uses O(z log(n/z)) space where z is Lempel-Ziv phrases
Finds pattern occurrences in O(m log n + occ log^ε n) time
Supports efficient direct access to compressed text
Abstract
The Block Tree is a recently proposed data structure that reaches compression close to Lempel-Ziv while supporting efficient direct access to text substrings. In this paper we show how a self-index can be built on top of a Block Tree so that it provides efficient pattern searches while using space proportional to that of the original data structure. More precisely, if a Lempel-Ziv parse cuts a text of length into non-overlapping phrases, then our index uses words and finds the occurrences of a pattern of length in time for any constant .
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
