A MOSAIC of methods: Improving ortholog detection through integration of algorithmic diversity
M. Cyrus Maher, Ryan D. Hernandez

TL;DR
MOSAIC is a new tool that integrates diverse ortholog detection methods to significantly improve alignment coverage and quality, enabling better comparative genomics analyses and detection of positively selected sites.
Contribution
This paper introduces MOSAIC, the first tool to combine multiple ortholog detection methods, greatly enhancing alignment completeness and accuracy over individual methods.
Findings
MOSAIC more than quintuples the number of complete alignments.
It maintains or improves ortholog quality metrics.
It detects up to 180% more positively selected sites.
Abstract
Ortholog detection (OD) is a critical step for comparative genomic analysis of protein-coding sequences. In this paper, we begin with a comprehensive comparison of four popular, methodologically diverse OD methods: MultiParanoid, Blat, Multiz, and OMA. In head-to-head comparisons, these methods are shown to significantly outperform one another 12-30% of the time. This high complementarity motivates the presentation of the first tool for integrating methodologically diverse OD methods. We term this program MOSAIC, or Multiple Orthologous Sequence Analysis and Integration by Cluster optimization. Relative to component and competing methods, we demonstrate that MOSAIC more than quintuples the number of alignments for which all species are present, while simultaneously maintaining or improving functional-, phylogenetic-, and sequence identity-based measures of ortholog quality. Further, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Advanced Proteomics Techniques and Applications · Glycosylation and Glycoproteins Research
