On the Complexity of Finding Approximate LCS of Multiple Strings

Hamed Hasibi; Neerja Mhaskar; W. F. Smyth

arXiv:2505.15992·cs.DS·September 22, 2025

On the Complexity of Finding Approximate LCS of Multiple Strings

Hamed Hasibi, Neerja Mhaskar, W. F. Smyth

PDF

Open Access

TL;DR

This paper investigates the computational complexity of finding approximate longest common substrings among multiple strings, proposing efficient algorithms for certain cases and establishing lower bounds under complexity hypotheses.

Contribution

It introduces algorithms for restricted ALCS variants using advanced data structures and analyzes their complexity, extending the study to indeterminate strings.

Findings

01

Algorithms with quadratic and near-linear run times for specific ALCS variants

02

Conditional lower bounds based on the Strong Exponential Time Hypothesis

03

Extension of methods to indeterminate strings

Abstract

Finding an Approximate Longest Common Substring (ALCS) within a given set $S = {s_{1}, s_{2}, \dots, s_{m}}$ of $m \geq 2$ strings is a key problem in computational biology, such as identifying related mutations across multiple genetic sequences. We study several variants of ALCS problems that, given integers $k$ and $t \leq m$ , seek the longest string $u$ -- or the longest substring $u$ of any string in $S$ -- that lies within distance $k$ of at least one substring in $t$ distinct strings from $S$ . While the general problems are NP-hard, we present efficient algorithms for restricted cases under Hamming and edit distances using the $L C P_{k}$ and $k$ -errata tree data structures. Our methods achieve run times of $O (N^{2})$ , $O (k ℓ N^{2})$ , and $O (m N lo g^{k} ℓ)$ , where $ℓ$ is the length of the longest string and $N$ is the sum of the lengths of all the strings in $S$ .…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Data Mining Algorithms and Applications · Face and Expression Recognition