From gene trees to species trees II: Species tree inference in the deep coalescence model
Louxin Zhang

TL;DR
This paper explores the relationship between gene trees and species trees, showing how to compute deep coalescence costs efficiently, comparing it with gene duplication costs, and establishing the NP-hardness of species tree inference by minimizing deep coalescences.
Contribution
It establishes a linear-time method to compute deep coalescence costs and proves the NP-hardness of species tree inference based on this criterion.
Findings
Deep coalescence cost equals gene losses minus twice gene duplication cost.
Deep coalescence cost is always at least the gene duplication cost.
Species tree inference by minimizing deep coalescences is NP-hard.
Abstract
When gene copies are sampled from various species, the resulting gene tree might disagree with the containing species tree. The primary causes of gene tree and species tree discord include lineage sorting, horizontal gene transfer, and gene duplication and loss. Each of these events yields a different parsimony criterion for inferring the (containing) species tree from gene trees. With lineage sorting, species tree inference is to find the tree minimizing extra gene lineages that had to coexist along species lineages; with gene duplication, it becomes to find the tree minimizing gene duplications and/or losses. In this paper, we show the following results: (i) The deep coalescence cost is equal to the number of gene losses minus two times the gene duplication cost in the reconciliation of a uniquely leaf labeled gene tree and a species tree. The deep coalescence cost can be computed in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Gene expression and cancer classification · Bioinformatics and Genomic Networks
