A Fast Randomized Algorithm for Finding the Maximal Common Subsequences

Jin Cao; Dewei Zhong

arXiv:2009.03352·cs.DS·September 9, 2020

A Fast Randomized Algorithm for Finding the Maximal Common Subsequences

Jin Cao, Dewei Zhong

PDF

Open Access

TL;DR

This paper introduces a fast randomized algorithm called Random-MCS for finding maximal common subsequences among multiple strings, which is efficient for large numbers of strings and often approximates the longest common subsequence.

Contribution

The paper presents a novel randomized algorithm for Maximal Common Subsequence that operates in linear time relative to the number of strings, suitable for large-scale problems.

Findings

01

Algorithm complexity is linear in the number of strings.

02

Repeated runs often yield solutions close to the LCS.

03

Theoretical and experimental validation of the approach.

Abstract

Finding the common subsequences of $L$ multiple strings has many applications in the area of bioinformatics, computational linguistics, and information retrieval. A well-known result states that finding a Longest Common Subsequence (LCS) for $L$ strings is NP-hard, e.g., the computational complexity is exponential in $L$ . In this paper, we develop a randomized algorithm, referred to as {\em Random-MCS}, for finding a random instance of Maximal Common Subsequence ( $M C S$ ) of multiple strings. A common subsequence is {\em maximal} if inserting any character into the subsequence no longer yields a common subsequence. A special case of MCS is LCS where the length is the longest. We show the complexity of our algorithm is linear in $L$ , and therefore is suitable for large $L$ . Furthermore, we study the occurrence probability for a single instance of MCS and demonstrate via both theoretical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Machine Learning and Algorithms · Advanced Image and Video Retrieval Techniques