A fast and simple $O (z \log n)$-space index for finding approximately longest common substrings
Nick Fagan, Jorge Hermo Gonz\'alez, Travis Gagie

TL;DR
This paper introduces a space-efficient index for large texts that enables approximate longest common substring searches with high probability, using only $O(z \, \log n)$ space where $z$ is the LZ77 parse size.
Contribution
It presents a novel index structure that efficiently supports approximate LCS queries with sublinear space proportional to the LZ77 parse size.
Findings
Index uses $O(z \log n)$ space.
Query time is $O(m \log \log z + \mathrm{polylog}(m+z))$ with high probability.
Achieves near-linear approximation of the longest common substring.
Abstract
We describe how, given a text and a positive constant , we can build a simple -space index, where is the number of phrases in the LZ77 parse of , such that later, given a pattern , in time and with high probability we can find a substring of that occurs in and whose length is at least a -fraction of the length of a longest common substring of and .
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · semigroups and automata theory · DNA and Biological Computing
