Improved Algorithms for Approximate String Matching (Extended Abstract)

Dimitris Papamichail; Georgios Papamichail

arXiv:0807.4368·cs.DS·July 29, 2008·1 cites

Improved Algorithms for Approximate String Matching (Extended Abstract)

Dimitris Papamichail, Georgios Papamichail

PDF

Open Access

TL;DR

This paper introduces a new output-sensitive algorithm for approximate string matching that improves theoretical bounds and performs well in practice, especially for strings with significant length differences.

Contribution

The paper presents a novel output-sensitive algorithm for edit distance with improved theoretical bounds and practical performance.

Findings

01

Achieves time complexity O((s-|n-m|)min(m,n,s)+m+n)

02

Outperforms existing algorithms in cases with large length differences

03

Source code is publicly available for implementation and testing.

Abstract

The problem of approximate string matching is important in many different areas such as computational biology, text processing and pattern recognition. A great effort has been made to design efficient algorithms addressing several variants of the problem, including comparison of two strings, approximate pattern identification in a string or calculation of the longest common subsequence that two strings share. We designed an output sensitive algorithm solving the edit distance problem between two strings of lengths n and m respectively in time O((s-|n-m|)min(m,n,s)+m+n) and linear space, where s is the edit distance between the two strings. This worst-case time bound sets the quadratic factor of the algorithm independent of the longest string length and improves existing theoretical bounds for this problem. The implementation of our algorithm excels also in practice, especially in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Network Packet Processing and Optimization · DNA and Biological Computing