Representing and extending ensembles of parsimonious evolutionary histories with a directed acyclic graph
Will Dumm, Mary Barker, William Howard-Snyder, William S. DeWitt and, Frederick A. Matsen IV

TL;DR
This paper introduces the history sDAG, a novel data structure that efficiently represents and extends ensembles of parsimonious phylogenetic trees, enabling better uncertainty quantification for large datasets.
Contribution
The paper develops the history sDAG, a new compact representation for ensembles of phylogenetic trees that can be efficiently constructed and extended to include more trees.
Findings
Efficient construction of the history sDAG from tree ensembles.
Ability to extend the ensemble to include additional parsimonious trees.
Potential use of the history sDAG as a skeleton for uncertainty quantification.
Abstract
In many situations, it would be useful to know not just the best phylogenetic tree for a given data set, but the collection of high-quality trees. This goal is typically addressed using Bayesian techniques, however, current Bayesian methods do not scale to large data sets. Furthermore, for large data sets with relatively low signal one cannot even store every good tree individually, especially when the trees are required to be bifurcating. In this paper, we develop a novel object called the "history subpartition directed acyclic graph" (or "history sDAG" for short) that compactly represents an ensemble of trees with labels (e.g. ancestral sequences) mapped onto the internal nodes. The history sDAG can be built efficiently and can also be efficiently trimmed to only represent maximally parsimonious trees. We show that the history sDAG allows us to find many additional equally…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Genetic diversity and population structure · Evolution and Paleontology Studies
