Algorithms for normalized multiple sequence alignments
Eloi Araujo, Luiz Rozante, Diego P. Rubert, Fabio V. Martinez

TL;DR
This paper introduces the first normalized methods for multiple sequence alignment (MSA), defining new scoring criteria, proving NP-hardness, and providing exact and approximation algorithms for improved alignment accuracy.
Contribution
It develops the initial normalized MSA techniques, establishes their computational complexity, and offers algorithms for practical application.
Findings
Normalized MSA criteria are NP-hard to compute.
Exact algorithms are proposed for the new criteria.
Approximation algorithms are provided for certain scoring matrices.
Abstract
Sequence alignment supports numerous tasks in bioinformatics, natural language processing, pattern recognition, social sciences, and others fields. While the alignment of two sequences may be performed swiftly in many applications, the simultaneous alignment of multiple sequences proved to be naturally more intricate. Although most multiple sequence alignment (MSA) formulations are NP-hard, several approaches have been developed, as they can outperform pairwise alignment methods or are necessary for some applications. Taking into account not only similarities but also the lengths of the compared sequences (i.e. normalization) can provide better alignment results than both unnormalized or post-normalized approaches. While some normalized methods have been developed for pairwise sequence alignment, none have been proposed for MSA. This work is a first effort towards the development of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
