Longest Common Prefixes with $k$-Errors and Applications
Lorraine A.K. Ayad, Panagiotis Charalampopoulos, Costas S., Iliopoulos, Solon P. Pissis

TL;DR
This paper introduces improved average-case algorithms for finding longest common prefixes with up to k-errors in strings, applicable to biological data, with extensions to both Hamming and edit distance models.
Contribution
It presents the first average-case algorithms with linear space for the k-error prefix problem, extending to edit distance, and demonstrates broad applicability.
Findings
Algorithms run in $ ext{O}(n ext{log}^k n ext{log} ext{log} n)$ average time
Applicable to computational biology and other fields
Achieves improvements over previous methods
Abstract
Although real-world text datasets, such as DNA sequences, are far from being uniformly random, average-case string searching algorithms perform significantly better than worst-case ones in most applications of interest. In this paper, we study the problem of computing the longest prefix of each suffix of a given string of length over a constant-sized alphabet that occurs elsewhere in the string with -errors. This problem has already been studied under the Hamming distance model. Our first result is an improvement upon the state-of-the-art average-case time complexity for non-constant and using only linear space under the Hamming distance model. Notably, we show that our technique can be extended to the edit distance model with the same time and space complexities. Specifically, our algorithms run in time on average using …
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
