Longest Common Subsequence in k-length substrings

Gary Benson; Avivit Levy; Riva Shalom

arXiv:1402.2097·cs.DS·February 11, 2014·1 cites

Longest Common Subsequence in k-length substrings

Gary Benson, Avivit Levy, Riva Shalom

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new computational biology problem, LCSk, generalizing the classic LCS, and provides efficient algorithms for both LCSk and a related distance measure, EDk, with practical time and space complexities.

Contribution

It defines the LCSk problem, extending LCS to k-length substrings, and presents algorithms with optimal time complexity for solving LCSk and EDk distance.

Findings

01

LCSk can be computed in O(n^2) time for strings of length n.

02

EDk distance measure can be computed in O(nm) time and O(km) space.

03

The algorithms generalize classical LCS solutions to k-length substrings.

Abstract

In this paper we define a new problem, motivated by computational biology, $L C S k$ aiming at finding the maximal number of $k$ length $s u b s t r in g s$ , matching in both input strings while preserving their order of appearance. The traditional LCS definition is a special case of our problem, where $k = 1$ . We provide an algorithm, solving the general case in $O (n^{2})$ time, where $n$ is the length of the input strings, equaling the time required for the special case of $k = 1$ . The space requirement of the algorithm is $O (k n)$ . %, however, in order to enable %backtracking of the solution, $O (n^{2})$ space is needed. We also define a complementary $E D k$ distance measure and show that $E D k (A, B)$ can be computed in $O (nm)$ time and $O (k m)$ space, where $m$ , $n$ are the lengths of the input sequences $A$ and $B$ respectively.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fpavetic/lcskpp
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Genomics and Phylogenetic Studies · Genome Rearrangement Algorithms