Longest Common Prefixes with $k$-Errors and Applications

Lorraine A.K. Ayad; Panagiotis Charalampopoulos; Costas S.; Iliopoulos; Solon P. Pissis

arXiv:1801.04425·cs.DS·January 16, 2018

Longest Common Prefixes with $k$-Errors and Applications

Lorraine A.K. Ayad, Panagiotis Charalampopoulos, Costas S., Iliopoulos, Solon P. Pissis

PDF

TL;DR

This paper introduces improved average-case algorithms for finding longest common prefixes with up to k-errors in strings, applicable to biological data, with extensions to both Hamming and edit distance models.

Contribution

It presents the first average-case algorithms with linear space for the k-error prefix problem, extending to edit distance, and demonstrates broad applicability.

Findings

01

Algorithms run in $ ext{O}(n ext{log}^k n ext{log} ext{log} n)$ average time

02

Applicable to computational biology and other fields

03

Achieves improvements over previous methods

Abstract

Although real-world text datasets, such as DNA sequences, are far from being uniformly random, average-case string searching algorithms perform significantly better than worst-case ones in most applications of interest. In this paper, we study the problem of computing the longest prefix of each suffix of a given string of length $n$ over a constant-sized alphabet that occurs elsewhere in the string with $k$ -errors. This problem has already been studied under the Hamming distance model. Our first result is an improvement upon the state-of-the-art average-case time complexity for non-constant $k$ and using only linear space under the Hamming distance model. Notably, we show that our technique can be extended to the edit distance model with the same time and space complexities. Specifically, our algorithms run in $O (n lo g^{k} n lo g lo g n)$ time on average using $O (n)$ …

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.