The k-mismatch problem revisited
Rapha\"el Clifford, Allyx Fontaine, Ely Porat, Benjamin Sach, Tatiana, Starikovskaya

TL;DR
This paper revisits the k-mismatch pattern matching problem, providing improved algorithms for offline and streaming settings, with better time and space complexities, including approximation methods.
Contribution
It introduces new deterministic and randomized algorithms for k-mismatch, improving upon previous results in both offline and streaming models, with significant complexity reductions.
Findings
Deterministic offline algorithm with $O(n k^2 ext{log}k / m + n ext{polylog} m)$ time.
Randomized online algorithm with similar time complexity and $O(k^2 ext{polylog} m)$ space.
Approximate streaming algorithm with $O(k^2 ext{polylog} m / ext{epsilon}^2)$ space and $O( ext{polylog} m / ext{epsilon}^2)$ time per symbol.
Abstract
We revisit the complexity of one of the most basic problems in pattern matching. In the k-mismatch problem we must compute the Hamming distance between a pattern of length m and every m-length substring of a text of length n, as long as that Hamming distance is at most k. Where the Hamming distance is greater than k at some alignment of the pattern and text, we simply output "No". We study this problem in both the standard offline setting and also as a streaming problem. In the streaming k-mismatch problem the text arrives one symbol at a time and we must give an output before processing any future symbols. Our main results are as follows: 1) Our first result is a deterministic time offline algorithm for k-mismatch on a text of length n. This is a factor of k improvement over the fastest previous result of this form from SODA 2000 by Amihood…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · semigroups and automata theory · DNA and Biological Computing
