Ultra-large alignments using Phylogeny-aware Profiles

Nam-phuong Nguyen; Siavash Mirarab; Keerthana Kumar; Tandy Warnow

arXiv:1504.01142·q-bio.GN·April 7, 2015

Ultra-large alignments using Phylogeny-aware Profiles

Nam-phuong Nguyen, Siavash Mirarab, Keerthana Kumar, Tandy Warnow

PDF

Open Access 1 Repo

TL;DR

UPP is a new machine learning-based method that produces highly accurate multiple sequence alignments for large and fragmentary datasets, improving biological analyses like evolutionary history estimation.

Contribution

The paper introduces UPP, a novel alignment method using an ensemble of Hidden Markov Models for ultra-large and fragmentary sequence datasets.

Findings

01

Achieves high accuracy on large datasets

02

Performs well with fragmentary sequences

03

Applicable to both nucleotide and amino acid sequences

Abstract

Many biological questions, including the estimation of deep evolutionary histories and the detection of remote homology between protein sequences, rely upon multiple sequence alignments (MSAs) and phylogenetic trees of large datasets. However, accurate large-scale multiple sequence alignment is very difficult, especially when the dataset contains fragmentary sequences. We present UPP, an MSA method that uses a new machine learning technique - the Ensemble of Hidden Markov Models - that we propose here. UPP produces highly accurate alignments for both nucleotide and amino acid sequences, even on ultra-large datasets or datasets containing fragmentary sequences. UPP is available at https://github.com/smirarab/sepp.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

smirarab/sepp
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenomics and Phylogenetic Studies · Machine Learning in Bioinformatics · Algorithms and Data Compression