Anomaly zones for uniformly sampled gene trees under the gene duplication and loss model
Brandon Legried

TL;DR
This paper characterizes the anomaly zones in gene trees under the gene duplication and loss model, revealing their existence depends on the tree topology, which impacts phylogenomic inference accuracy.
Contribution
It extends the understanding of anomaly zones from the multispecies coalescent to the GDL model, analyzing their presence in different tree shapes.
Findings
Anomaly zones do not exist for balanced trees with four species.
Anomaly zones exist for caterpillar trees, similar to the multispecies coalescent.
The analysis is based on probabilistic trajectories of the GDL process.
Abstract
Recently, there has been interest in extending long-known results about the multispecies coalescent tree to other models of gene trees. Results about the gene duplication and loss (GDL) tree have mathematical proofs, including species tree identifiability, estimability, and sample complexity of popular algorithms like ASTRAL. Here, this work is continued by characterizing the anomaly zones of uniformly sampled gene trees. The anomaly zone for species trees is the set of parameters where some discordant gene tree occurs with the maximal probability. The detection of anomalous gene trees is an important problem in phylogenomics, as their presence renders effective estimation methods to being positively misleading. Under the multispecies coalescent, anomaly zones are known to exist for rooted species trees with as few as four species. The gene duplication and loss process is a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Genetic diversity and population structure · Bioinformatics and Genomic Networks
