Longest common substring with approximately $k$ mismatches
Tomasz Kociumaka, Jakub Radoszewski, and Tatiana Starikovskaya

TL;DR
This paper explores the computational complexity of the longest common substring problem with mismatches, introduces an approximate variant using locality-sensitive hashing, and provides algorithms with subquadratic runtime and approximation guarantees.
Contribution
It introduces the approximate longest common substring with mismatches problem, develops a subquadratic solution using locality-sensitive hashing, and establishes hardness results for improvements.
Findings
Conditional lower bound based on SETH hypothesis.
A subquadratic-time 2-approximation algorithm for the problem.
Conditional hardness results for better approximation ratios.
Abstract
In the longest common substring problem, we are given two strings of length and must find a substring of maximal length that occurs in both strings. It is well known that the problem can be solved in linear time, but the solution is not robust and can vary greatly when the input strings are changed even by one character. To circumvent this, Leimeister and Morgenstern introduced the problem of the longest common substring with mismatches. Lately, this problem has received a lot of attention in the literature. In this paper, we first show a conditional lower bound based on the SETH hypothesis implying that there is little hope to improve existing solutions. We then introduce a new but closely related problem of the longest common substring with approximately mismatches and use locality-sensitive hashing to show that it admits a solution with strongly subquadratic running time.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Network Packet Processing and Optimization · Cholangiocarcinoma and Gallbladder Cancer Studies
