Document Listing on Repetitive Collections with Guaranteed Performance
Gonzalo Navarro

TL;DR
This paper introduces a new document listing index for repetitive string collections that guarantees efficient worst-case query times and uses space proportional to the size of the repetitive structure, improving performance over previous methods.
Contribution
The paper presents the first document listing index with size $ ilde{O}(n+s)$ and worst-case guarantees, along with novel grammar-based indexes for counting pattern occurrences efficiently.
Findings
Index size is $O((n ext{log}\sigma+s ext{log}^2 N) ext{log} D)$ bits.
Pattern occurrence counting is achieved in $O(m^2 + m ext{log}^{2+ extepsilon} r)$ time.
The index can count occurrences in $O(m ext{log}^{2+ extepsilon} N)$ time for Lempel-Ziv parsed texts.
Abstract
We consider document listing on string collections, that is, finding in which strings a given pattern appears. In particular, we focus on repetitive collections: a collection of size over alphabet is composed of copies of a string of size , and edits are applied on ranges of copies. We introduce the first document listing index with size , precisely bits, and with useful worst-case time guarantees: Given a pattern of length , the index reports the strings where it appears in time , for any constant (and tells in time if ). Our technique is to augment a range data structure that is commonly used on grammar-based indexes, so that instead of retrieving all the pattern occurrences, it computes useful summaries on them. We show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
