Streaming dictionary matching with mismatches
Pawe{\l} Gawrychowski, Tatiana Starikovskaya

TL;DR
This paper extends efficient streaming algorithms from the $k$-mismatch problem to the more complex dictionary matching with $k$ mismatches, providing new algorithms with specific space and time bounds and establishing lower bounds.
Contribution
It introduces a novel streaming algorithm for dictionary matching with $k$ mismatches and proves a lower bound on space complexity for this problem.
Findings
Developed a randomized streaming algorithm with $O(k d ext{polylog}(n))$ space.
Achieved $O(k ext{polylog}(n) + | ext{occ}|)$ time per position.
Proved a lower bound of $ ext{Omega}(k d)$ bits of space for any streaming algorithm.
Abstract
In the -mismatch problem we are given a pattern of length and a text and must find all locations where the Hamming distance between the pattern and the text is at most . A series of recent breakthroughs have resulted in an ultra-efficient streaming algorithm for this problem that requires only space and time per letter [Clifford, Kociumaka, Porat, SODA 2019]. In this work, we consider a strictly harder problem called dictionary matching with mismatches. In this problem, we are given a dictionary of patterns, where the length of each pattern is at most , and must find all substrings of the text that are within Hamming distance from one of the patterns. We develop a streaming algorithm for this problem with space and $O(k \log^{k} d \mathrm{polylog}(n)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
