MuSAlS: A Fast Multiple Sequence Alignment Approach Using Hierarchical Clustering
Emily G. Light, Morgan Prior, Noah M. Daniels, Najib Ishaq

TL;DR
MuSAlS is a fast, scalable multiple sequence alignment tool that uses hierarchical clustering and dynamic programming to efficiently handle large datasets with competitive accuracy.
Contribution
It introduces MuSAlS, a novel MSA approach combining hierarchical clustering with exact dynamic programming for improved scalability and speed.
Findings
Achieves competitive accuracy with state-of-the-art methods
Significantly faster runtime on large datasets
Effective for genomic and metagenomic data analysis
Abstract
Motivation: The multiple sequence alignment (MSA) problem has been extensively studied, with numerous approaches developed over recent years. With the rapid growth of sequence data, there is an increasing need for fast and accurate MSA tools that scale effectively to large datasets. Building on our previous work on CLAM, we are able to use exact dynamic programming (Needleman-Wunsch) while scaling to large datasets. We introduce MuSAlS (Multiple Sequence Alignment at Scale), a fast and scalable de novo MSA aligner. MuSAlS uses hierarchical clustering to construct a guide tree based on the Levenshtein distance metric, enabling efficient and accurate alignment through a bottom-up approach. Results: MuSAlS achieves competitive accuracy compared to state-of-the-art methods while significantly improving runtime performance. This makes it a valuable tool for researchers analyzing large-scale…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Bioinformatics and Genomic Networks · Genome Rearrangement Algorithms
