Approximating longest common substring with $k$ mismatches: Theory and   practice

Garance Gourdel; Tomasz Kociumaka; Jakub Radoszewski; Tatiana; Starikovskaya

arXiv:2004.13389·cs.DS·April 29, 2020·1 cites

Approximating longest common substring with $k$ mismatches: Theory and practice

Garance Gourdel, Tomasz Kociumaka, Jakub Radoszewski, Tatiana, Starikovskaya

PDF

Open Access

TL;DR

This paper introduces new efficient approximation algorithms for the longest common substring with k mismatches problem, combining theoretical advances with practical evaluation, and establishing near-optimality through lower bounds.

Contribution

The work presents novel approximation algorithms that are both theoretically efficient and practically effective, improving upon previous solutions for the problem.

Findings

01

Algorithms are significantly faster than prior solutions.

02

Experimental results confirm practical efficiency and effectiveness.

03

Theoretical analysis suggests near-optimality of the approach.

Abstract

In the problem of the longest common substring with $k$ mismatches we are given two strings $X, Y$ and must find the maximal length $ℓ$ such that there is a length- $ℓ$ substring of $X$ and a length- $ℓ$ substring of $Y$ that differ in at most $k$ positions. The length $ℓ$ can be used as a robust measure of similarity between $X, Y$ . In this work, we develop new approximation algorithms for computing $ℓ$ that are significantly more efficient that previously known solutions from the theoretical point of view. Our approach is simple and practical, which we confirm via an experimental evaluation, and is probably close to optimal as we demonstrate via a conditional lower bound.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Wireless Communication Networks Research · Cellular Automata and Applications