HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing
Shixiang Wan, Quan Zou

TL;DR
HAlign-II is a scalable, efficient tool leveraging distributed computing for ultra-large biological sequence alignment and phylogenetic tree reconstruction, outperforming existing methods in speed, memory efficiency, and usability.
Contribution
This paper introduces HAlign-II, a novel distributed computing-based tool that significantly improves ultra-large sequence alignment and phylogenetic analysis performance.
Findings
Efficiently handles ultra-large biological sequences for MSA and phylogenetic tree construction.
Exhibits high memory efficiency and scalability with increased computing resources.
Provides a user-friendly web interface based on distributed computing infrastructure.
Abstract
Multiple sequence alignment (MSA) plays a key role in biological sequence analyses, especially in phylogenetic tree construction. Extreme increase in next-generation sequencing results in shortage of efficient ultra-large biological sequence alignment approaches for coping with different sequence types. Distributed and parallel computing represents a crucial technique for accelerating ultra-large sequence analyses. Based on HAlign and Spark distributed computing system, we implement a highly cost-efficient and time-efficient HAlign-II tool to address ultra-large multiple biological sequence alignment and phylogenetic tree construction. After comparing with most available state-of-the-art methods, our experimental results indicate the following: 1) HAlign-II can efficiently carry out MSA and construct phylogenetic trees with ultra-large biological sequences; 2) HAlign-II shows extremely…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Genetic diversity and population structure · Scientific Computing and Data Management
