Longest Common Subsequence in k-length substrings
Gary Benson, Avivit Levy, Riva Shalom

TL;DR
This paper introduces a new computational biology problem, LCSk, generalizing the classic LCS, and provides efficient algorithms for both LCSk and a related distance measure, EDk, with practical time and space complexities.
Contribution
It defines the LCSk problem, extending LCS to k-length substrings, and presents algorithms with optimal time complexity for solving LCSk and EDk distance.
Findings
LCSk can be computed in O(n^2) time for strings of length n.
EDk distance measure can be computed in O(nm) time and O(km) space.
The algorithms generalize classical LCS solutions to k-length substrings.
Abstract
In this paper we define a new problem, motivated by computational biology, aiming at finding the maximal number of length , matching in both input strings while preserving their order of appearance. The traditional LCS definition is a special case of our problem, where . We provide an algorithm, solving the general case in time, where is the length of the input strings, equaling the time required for the special case of . The space requirement of the algorithm is . %, however, in order to enable %backtracking of the solution, space is needed. We also define a complementary distance measure and show that can be computed in time and space, where , are the lengths of the input sequences and respectively.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Genomics and Phylogenetic Studies · Genome Rearrangement Algorithms
