TL;DR
This paper introduces a universal text indexing framework that efficiently handles long pattern matching by combining sketching techniques with existing index structures, improving speed and space efficiency especially for long queries.
Contribution
The authors propose a novel universal indexing paradigm using pattern and text sketches, enabling faster construction and querying for long patterns across various index types.
Findings
Universal indexes are faster to construct than traditional indexes.
They require significantly less space due to sketching.
Query times are maintained or improved with sketch-based matching.
Abstract
Text indexing is a fundamental and well-studied problem. Classic solutions either replace the original text with a compressed representation, e.g., the FM-index and its variants, or keep it uncompressed but attach some redundancy - an index - to accelerate matching. The former solutions thus retain excellent compressed space, but are slow in practice. The latter approaches, like the suffix array, instead sacrifice space for speed. We show that efficient text indexing can be achieved using just a small extra space on top of the original text, provided that the query patterns are sufficiently long. More specifically, we develop a new indexing paradigm in which a sketch of a query pattern is first matched against a sketch of the text. Once candidate matches are retrieved, they are verified using the original text. This paradigm is thus universal in the sense that it allows us to use any…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
