Approximating longest common substring with $k$ mismatches: Theory and practice
Garance Gourdel, Tomasz Kociumaka, Jakub Radoszewski, Tatiana, Starikovskaya

TL;DR
This paper introduces new efficient approximation algorithms for the longest common substring with k mismatches problem, combining theoretical advances with practical evaluation, and establishing near-optimality through lower bounds.
Contribution
The work presents novel approximation algorithms that are both theoretically efficient and practically effective, improving upon previous solutions for the problem.
Findings
Algorithms are significantly faster than prior solutions.
Experimental results confirm practical efficiency and effectiveness.
Theoretical analysis suggests near-optimality of the approach.
Abstract
In the problem of the longest common substring with mismatches we are given two strings and must find the maximal length such that there is a length- substring of and a length- substring of that differ in at most positions. The length can be used as a robust measure of similarity between . In this work, we develop new approximation algorithms for computing that are significantly more efficient that previously known solutions from the theoretical point of view. Our approach is simple and practical, which we confirm via an experimental evaluation, and is probably close to optimal as we demonstrate via a conditional lower bound.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Wireless Communication Networks Research · Cellular Automata and Applications
