Optimal Substring-Equality Queries with Applications to Sparse Text   Indexing

Nicola Prezza

arXiv:1803.01723·cs.DS·May 12, 2020

Optimal Substring-Equality Queries with Applications to Sparse Text Indexing

Nicola Prezza

PDF

TL;DR

This paper introduces a space-efficient data structure for substring equality queries that supports constant-time access, enabling in-place, subquadratic algorithms for various string processing tasks, including suffix sorting and LCP array construction.

Contribution

It presents a new optimal encoding supporting efficient substring equality queries and applies it to develop the first in-place subquadratic algorithms for key string problems.

Findings

01

Achieves optimal $O(1)$ query time with minimal redundancy

02

Provides the first in-place subquadratic algorithms for sparse suffix sorting and LCP array construction

03

Develops the first sublinear-time algorithms for small sets of suffixes and builds sparse suffix trees efficiently

Abstract

We consider the problem of encoding a string of length $n$ from an integer alphabet of size $σ$ so that access and substring equality queries (that is, determining the equality of any two substrings) can be answered efficiently. Any uniquely-decodable encoding supporting access must take $n lo g σ + Θ (lo g (n lo g σ))$ bits. We describe a new data structure matching this lower bound when $σ \leq n^{O (1)}$ while supporting both queries in optimal $O (1)$ time. Furthermore, we show that the string can be overwritten in-place with this structure. The redundancy of $Θ (lo g n)$ bits and the constant query time break exponentially a lower bound that is known to hold in the read-only model. Using our new string representation, we obtain the first in-place subquadratic (indeed, even sublinear in some cases) algorithms for several string-processing problems in the restore…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.