A Data-Structure for Approximate Longest Common Subsequence of A Set of   Strings

Sepideh Aghamolaei

arXiv:2008.01768·cs.DS·January 13, 2021

A Data-Structure for Approximate Longest Common Subsequence of A Set of Strings

Sepideh Aghamolaei

PDF

Open Access

TL;DR

This paper introduces a novel data-structure for efficiently approximating the longest common subsequence among a set of strings, using a tree structure to enable sublinear query times with controlled approximation error.

Contribution

It presents a new data-structure that preprocesses multiple strings for faster approximate LCS queries, incorporating error tolerance and extending to LIS problems.

Findings

01

Achieves sublinear-time approximate LCS queries

02

Handles error via character replacements in the approximation

03

Extends methodology to the longest increasing subsequence problem

Abstract

Given a set of $k$ strings $I$ , their longest common subsequence (LCS) is the string with the maximum length that is a subset of all the strings in $I$ . A data-structure for this problem preprocesses $I$ into a data-structure such that the LCS of a set of query strings $Q$ with the strings of $I$ can be computed faster. Since the problem is NP-hard for arbitrary $k$ , we allow an error that allows some characters to be replaced by other characters. We define the approximation version of the problem with an extra input $m$ , which is the length of the regular expression (regex) that describes the input, and the approximation factor is the logarithm of the number of possibilities in the regex returned by the algorithm, divided by the logarithm regex with the minimum number of possibilities. Then, we use a tree data-structure to achieve sublinear-time LCS queries. We also explain how the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Network Packet Processing and Optimization · DNA and Biological Computing