Sample-Align-D: A High Performance Multiple Sequence Alignment System using Phylogenetic Sampling and Domain Decomposition
Fahad Saeed, Ashfaq Khokhar

TL;DR
Sample-Align-D is a scalable parallel system for multiple sequence alignment that significantly reduces computation time by partitioning sequences and aligning them in parallel, achieving near real-time results on large datasets.
Contribution
It introduces a novel parallel MSA algorithm using phylogenetic sampling and domain decomposition, enabling high performance on large sequence sets.
Findings
Aligned 2000 sequences in less than 10 minutes on a 16-node cluster.
Achieved alignment accuracy comparable to established sequential methods.
Reduced computation time from over 23 hours to minutes.
Abstract
Multiple Sequence Alignment (MSA) is one of the most computationally intensive tasks in Computational Biology. Existing best known solutions for multiple sequence alignment take several hours (in some cases days) of computation time to align, for example, 2000 homologous sequences of average length 300. Inspired by the Sample Sort approach in parallel processing, in this paper we propose a highly scalable multiprocessor solution for the MSA problem in phylogenetically diverse sequences. Our method employs an intelligent scheme to partition the set of sequences into smaller subsets using kmer count based similarity index, referred to as k-mer rank. Each subset is then independently aligned in parallel using any sequential approach. Further fine tuning of the local alignments is achieved using constraints derived from a global ancestor of the entire set. The proposed Sample-Align-D…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Machine Learning in Bioinformatics · Algorithms and Data Compression
