String Matching with Inversions and Translocations in Linear Average   Time (Most of the Time)

Szymon Grabowski; Simone Faro; Emanuele Giaquinta

arXiv:1012.0280·cs.DS·May 9, 2013

String Matching with Inversions and Translocations in Linear Average Time (Most of the Time)

Szymon Grabowski, Simone Faro, Emanuele Giaquinta

PDF

TL;DR

This paper introduces an efficient algorithm for approximate string matching that accounts for translocations and inversions, achieving linear average time complexity under certain conditions, with practical effectiveness demonstrated through experiments.

Contribution

The paper presents a novel filtering-based algorithm for approximate pattern matching allowing translocations and inversions, with proven worst-case and average-case time complexities.

Findings

01

Worst-case time complexity is O(nm max(α, β)).

02

Average-case time complexity is O(n) under certain alphabet conditions.

03

Experimental results show high practical efficiency.

Abstract

We present an efficient algorithm for finding all approximate occurrences of a given pattern $p$ of length $m$ in a text $t$ of length $n$ allowing for translocations of equal length adjacent factors and inversions of factors. The algorithm is based on an efficient filtering method and has an $\bigO (nm max (α, β))$ -time complexity in the worst case and $\bigO (max (α, β))$ -space complexity, where $α$ and $β$ are respectively the maximum length of the factors involved in any translocation and inversion. Moreover we show that under the assumptions of equiprobability and independence of characters our algorithm has a $\bigO (n)$ average time complexity, whenever $σ = Ω (lo g m / lo g lo g^{1 - ϵ} m)$ , where $ϵ > 0$ and $σ$ is the dimension of the alphabet. Experiments show that the new proposed algorithm achieves very good results in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.