Approximate pattern matching with k-mismatches in packed text

Emanuele Giaquinta; Szymon Grabowski; Kimmo Fredriksson

arXiv:1211.5433·cs.DS·August 1, 2013

Approximate pattern matching with k-mismatches in packed text

Emanuele Giaquinta, Szymon Grabowski, Kimmo Fredriksson

PDF

Open Access

TL;DR

This paper introduces new algorithms for approximate pattern matching with k-mismatches in packed text, improving efficiency in specific computational models and extending techniques to other related problems.

Contribution

The authors develop faster algorithms for k-mismatch pattern matching in packed text, optimizing performance in the $AC^0$ and word-RAM models, and introduce techniques applicable to other approximate matching problems.

Findings

01

Improved time complexity bounds for $k$-mismatch pattern matching in packed text.

02

Algorithms outperform existing bounds for $w = ext{Omega}( ext{log}^{1+ ext{epsilon}} n)$.

03

Extended techniques to solve other approximate matching problems.

Abstract

Given strings $P$ of length $m$ and $T$ of length $n$ over an alphabet of size $σ$ , the string matching with $k$ -mismatches problem is to find the positions of all the substrings in $T$ that are at Hamming distance at most $k$ from $P$ . If $T$ can be read only one character at the time the best known bounds are $O (n k lo g k)$ and $O (n + n k / w lo g k)$ in the word-RAM model with word length $w$ . In the RAM models (including $A C^{0}$ and word-RAM) it is possible to read up to $\floor w / lo g σ$ characters in constant time if the characters of $T$ are encoded using $\ceil lo g σ$ bits. The only solution for $k$ -mismatches in packed text works in $O ((n lo g σ / lo g n) \ceil m lo g (k + lo g n / lo g σ) / w + n^{ε})$ time, for any $ε > 0$ . We present an algorithm that runs in time $O(\frac{n}{\floor{w/(m\log\sigma)}} (1 + \log…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · semigroups and automata theory · DNA and Biological Computing