Optimal Substring-Equality Queries with Applications to Sparse Text Indexing
Nicola Prezza

TL;DR
This paper introduces a space-efficient data structure for substring equality queries that supports constant-time access, enabling in-place, subquadratic algorithms for various string processing tasks, including suffix sorting and LCP array construction.
Contribution
It presents a new optimal encoding supporting efficient substring equality queries and applies it to develop the first in-place subquadratic algorithms for key string problems.
Findings
Achieves optimal $O(1)$ query time with minimal redundancy
Provides the first in-place subquadratic algorithms for sparse suffix sorting and LCP array construction
Develops the first sublinear-time algorithms for small sets of suffixes and builds sparse suffix trees efficiently
Abstract
We consider the problem of encoding a string of length from an integer alphabet of size so that access and substring equality queries (that is, determining the equality of any two substrings) can be answered efficiently. Any uniquely-decodable encoding supporting access must take bits. We describe a new data structure matching this lower bound when while supporting both queries in optimal time. Furthermore, we show that the string can be overwritten in-place with this structure. The redundancy of bits and the constant query time break exponentially a lower bound that is known to hold in the read-only model. Using our new string representation, we obtain the first in-place subquadratic (indeed, even sublinear in some cases) algorithms for several string-processing problems in the restore…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
