Fingerprints in Compressed Strings
Philip Bille, Patrick Hagge Cording, Inge Li G{\o}rtz, Benjamin Sach,, Hjalte Wedel Vildh{\o}j, S{\o}ren Vind

TL;DR
This paper introduces space-efficient data structures for computing Karp-Rabin fingerprints on compressed strings, enabling fast substring queries without decompression, and improves solutions for the longest common extension problem.
Contribution
It presents the first O(n) space data structures for fingerprint queries on compressed strings with sub-logarithmic query times for specific compression models.
Findings
O(n) space data structures for fingerprint queries
O((\log N)) query time for SLPs
O((\log \u2206 N)) query time for Linear SLPs
Abstract
The Karp-Rabin fingerprint of a string is a type of hash value that due to its strong properties has been used in many string algorithms. In this paper we show how to construct a data structure for a string of size compressed by a context-free grammar of size that answers fingerprint queries. That is, given indices and , the answer to a query is the fingerprint of the substring . We present the first O(n) space data structures that answer fingerprint queries without decompressing any characters. For Straight Line Programs (SLP) we get query time, and for Linear SLPs (an SLP derivative that captures LZ78 compression and its variations) we get query time. Hence, our data structures has the same time and space complexity as for random access in SLPs. We utilize the fingerprint data structures to solve the longest common extension…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · DNA and Biological Computing · semigroups and automata theory
