Longest common substrings with k mismatches

Tomas Flouri; Emanuele Giaquinta; Kassian Kobert; Esko; Ukkonen

arXiv:1409.1694·cs.DS·April 8, 2015

Longest common substrings with k mismatches

Tomas Flouri, Emanuele Giaquinta, Kassian Kobert, Esko, Ukkonen

PDF

TL;DR

This paper presents a practical linear-time, constant-space algorithm for finding the longest common substrings with up to k mismatches, and offers an improved theoretical solution for the case when k=1.

Contribution

It introduces a practical $O(nm)$ time, $O(1)$ space algorithm for the longest common substring with k mismatches, and a faster $O(n ext{log} m)$ solution for k=1.

Findings

01

Practical $O(nm)$ time, $O(1)$ space algorithm for general k.

02

Theoretical $O(n ext{log} m)$ time solution for k=1.

03

Improves over previous $O(nm)$ algorithms for k=1.

Abstract

The longest common substring with $k$ -mismatches problem is to find, given two strings $S_{1}$ and $S_{2}$ , a longest substring $A_{1}$ of $S_{1}$ and $A_{2}$ of $S_{2}$ such that the Hamming distance between $A_{1}$ and $A_{2}$ is $\leq k$ . We introduce a practical $O (nm)$ time and $O (1)$ space solution for this problem, where $n$ and $m$ are the lengths of $S_{1}$ and $S_{2}$ , respectively. This algorithm can also be used to compute the matching statistics with $k$ -mismatches of $S_{1}$ and $S_{2}$ in $O (nm)$ time and $O (m)$ space. Moreover, we also present a theoretical solution for the $k = 1$ case which runs in $O (n lo g m)$ time, assuming $m \leq n$ , and uses $O (m)$ space, improving over the existing $O (nm)$ time and $O (m)$ space bound of Babenko and Starikovskaya.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.