Text Indexing for Long Patterns using Locally Consistent Anchors
Lorraine A. K. Ayad, Grigorios Loukides, Solon P. Pissis

TL;DR
This paper introduces a novel text index based on locally consistent anchors that efficiently balances space, query time, and construction costs, especially when a lower bound on pattern length is known, outperforming traditional indexes in practical scenarios.
Contribution
It proposes a new index structure using lc-anchors that achieves optimal trade-offs across key measures and provides both average-case and worst-case guarantees, a first in this regime.
Findings
Outperforms classic indexes like suffix trees, suffix arrays, and FM-index in experiments.
Offers average-case guarantees for all four measures.
Provides a new index with worst-case guarantees based on lc-anchors.
Abstract
In many real-world database systems, a large fraction of the data is represented by strings: sequences of letters over some alphabet. This is because strings can easily encode data arising from different sources. It is often crucial to represent such string datasets in a compact form but also to simultaneously enable fast pattern matching queries. This is the classic text indexing problem. The four absolute measures anyone should pay attention to when designing or implementing a text index are: (i) index space; (ii) query time; (iii) construction space; and (iv) construction time. Unfortunately, however, most (if not all) widely-used indexes (e.g., suffix tree, suffix array, or their compressed counterparts) are not optimized for all four measures simultaneously, as it is difficult to have the best of all four worlds. Here, we take an important step in this direction by showing that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Rough Sets and Fuzzy Logic · Text and Document Classification Technologies
