A Fast Template Based Heuristic For Global Multiple Sequence Alignment
Srikrishnan Divakaran, Arpit Mithal, and Namit Jain

TL;DR
This paper introduces a fast, template-based heuristic for global multiple sequence alignment that incorporates feature classification and weighting, improving biological accuracy by leveraging structural, functional, and evolutionary data.
Contribution
It provides a novel mechanism for explicitly specifying feature types and weights, and develops a scoring model based on segment conservation for more accurate alignments.
Findings
Enables explicit feature classification and weighting in template-based heuristics.
Defines a scoring model based on segment conservation rather than single residues.
Proposes a fast progressive alignment heuristic for efficient global MSA construction.
Abstract
Advances in bio-technology have made available massive amounts of functional, structural and genomic data for many biological sequences. This increased availability of heterogeneous biological data has resulted in biological applications where a multiple sequence alignment (msa) is required for aligning similar features, where a feature is described in structural, functional or evolutionary terms. In these applications, for a given set of sequences, depending on the feature of interest the optimal msa is likely to be different, and sequence similarity can only be used as a rough initial estimate on the accuracy of an msa. This has motivated the growth in template based heuristics that supplement the sequence information with evolutionary, structural and functional data and exploit feature similarity instead of sequence similarity to construct multiple sequence alignments that are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Machine Learning in Bioinformatics · RNA and protein synthesis mechanisms
