MSARC: Multiple Sequence Alignment by Residue Clustering
Micha{\l} Modzelewski, Norbert Dojer

TL;DR
MSARC introduces a guide-tree-free, graph-clustering based algorithm for multiple sequence alignment, achieving high accuracy especially on datasets where traditional phylogenetic trees are inadequate, surpassing existing non-progressive methods.
Contribution
MSARC presents a novel guide-tree-free algorithm for multiple sequence alignment using graph clustering, improving accuracy on challenging datasets.
Findings
MSARC achieves alignment quality comparable to top progressive methods.
MSARC outperforms other non-progressive algorithms on difficult datasets.
MSARC is effective on sequences with complex evolutionary relationships.
Abstract
Progressive methods offer efficient and reasonably good solutions to the multiple sequence alignment problem. However, resulting alignments are biased by guide-trees, especially for relatively distant sequences. We propose MSARC, a new graph-clustering based algorithm that aligns sequence sets without guide-trees. Experiments on the BAliBASE dataset show that MSARC achieves alignment quality similar to best progressive methods and substantially higher than the quality of other non-progressive algorithms. Furthermore, MSARC outperforms all other methods on sequence sets with the similarity structure hardly represented by a phylogenetic tree. Furthermore, MSARC outperforms all other methods on sequence sets whose evolutionary distances are hardly representable by a phylogenetic tree. These datasets are most exposed to the guide-tree bias of alignments. MSARC is available at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Genetic diversity and population structure · Bioinformatics and Genomic Networks
