Multiple sequence alignment for short sequences
Krist\'of Tak\'acs

TL;DR
This paper investigates multiple sequence alignment (MSA) for short sequences, demonstrating that optimal alignments for length-1 and length-2 sequences can be achieved trivially, which may inform better algorithms for longer sequences.
Contribution
It shows that for very short sequences, optimal MSA can be obtained trivially, providing insights into the complexity and potential approaches for longer sequences.
Findings
Optimal MSA for length-1 sequences with any metric is trivial.
Optimal MSA for length-2 sequences with unit metric is trivial.
Results may aid in developing faster algorithms for longer sequences.
Abstract
Multiple sequence alignment (MSA) has been one of the most important problems in bioinformatics for more decades and it is still heavily examined by many mathematicians and biologists. However, mostly because of the practical motivation of this problem, the research on this topic is focused on aligning long sequences. It is understandable, since the sequences that need to be aligned (usually DNA or protein sequences) are generally quite long (e. g., at least 30-40 characters). Nevertheless, it is a challenging question that exactly where MSA starts to become a real hard problem (since it is known that MSA is NP-complete [2]), and the key to answer this question is to examine short sequences. If the optimal alignment for short sequences could be determined in polynomial time, then these results may help to develop faster or more accurate heuristic algorithms for aligning long sequences.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Genomics and Phylogenetic Studies · Genomic variations and chromosomal abnormalities
