Terrace Aware Phylogenomic Inference from Supermatrices
Olga Chernomor, Arndt von Haeseler, Bui Quang Minh

TL;DR
This paper introduces a terrace-aware data structure for phylogenomic inference that significantly speeds up supermatrix-based tree inference by avoiding redundant likelihood calculations, especially in datasets with missing data.
Contribution
The paper presents a novel terrace-aware data structure integrated into IQ-TREE, improving inference speed in supermatrix phylogenomics with missing data.
Findings
Achieves 1.7 to 6-fold speedup on real datasets
Speedup correlates with amount of missing data
Effective for large, sparse supermatrices
Abstract
One approach in phylogenomics to infer the tree of life is based on concatenated multiple sequence alignments from many genes. Unfortunately, the resulting so-called supermatrix is usually sparse, that is, not every gene sequence is available for all species in the supermatrix. Due to the missing sequence information a phylogenetic inference, assuming that each gene evolves with its own substitution model, suffers from phylogenetic terraces on which many phylogenetic trees show the same likelihood. Here, we propose a phylogenetic terrace aware (PTA) data structure for efficient supermatrix based tree inference under partition models. PTA avoids likelihood computations for trees belonging to the same terrace. PTA is implemented in the IQ-TREE software, and leads to an 1.7 to 6-fold speedup for real data sets compared with a na\"ive implementation. Speedups are independent on terrace…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Genetic diversity and population structure · Biomedical Text Mining and Ontologies
