Approximating LCS and Alignment Distance over Multiple Sequences
Debarati Das, Barna Saha

TL;DR
This paper develops approximation algorithms for the complex problem of multiple sequence alignment, specifically for the longest common subsequence and alignment distance, achieving near-optimal results within feasible computational times.
Contribution
It introduces new approximation algorithms for LCS and AD of multiple sequences, improving runtime and approximation factors under certain conditions.
Findings
Approximate LCS within a factor of rac{ ext{lambda}^2 n}{2+ ext{epsilon}} in ilde{O}_m(n^{loor{rac{m}{2} floor+1}) time.
Approximate AD within a factor of 2 in ilde{O}_m(n^{ ext{ceil}rac{m}{2} floor}) time.
Below-2 approximation for AD achieved under specific pseudorandomness conditions.
Abstract
We study the problem of aligning multiple sequences with the goal of finding an alignment that either maximizes the number of aligned symbols (the longest common subsequence (LCS)), or minimizes the number of unaligned symbols (the alignment distance (AD)). Multiple sequence alignment is a well-studied problem in bioinformatics and is used to identify regions of similarity among DNA, RNA, or protein sequences to detect functional, structural, or evolutionary relationships among them. It is known that exact computation of LCS or AD of sequences each of length requires time unless the Strong Exponential Time Hypothesis is false. In this paper, we provide several results to approximate LCS and AD of multiple sequences. If the LCS of sequences each of length is for some , then in …
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
