Novel Distances for Dollo Data
Michael Woodhams, Dorothy A. Steane, Rebecca C. Jones, Dean Nicolle,, Vincent Moulton, Barbara R. Holland

TL;DR
This paper introduces the Additive Dollo Distance (ADD), a new binary distance measure tailored for Dollo process data, demonstrating its theoretical advantages and superior performance over existing distances through simulations and real data applications.
Contribution
The paper presents the ADD, a novel distance for Dollo data, with proven consistency, theoretical properties, and improved performance over existing methods in simulations and empirical datasets.
Findings
ADD outperforms other binary distances on Dollo data.
LogDet distance performs poorly with Dollo data, affecting genome reconstruction.
ADD provides consistent and theoretically sound results for Dollo process data.
Abstract
We investigate distances on binary (presence/absence) data in the context of a Dollo process, where a trait can only arise once on a phylogenetic tree but may be lost many times. We introduce a novel distance, the Additive Dollo Distance (ADD), which is consistent for data generated under a Dollo model, and show that it has some useful theoretical properties including an intriguing link to the LogDet distance. Simulations of Dollo data are used to compare a number of binary distances including ADD, LogDet, Nei Li and some simple, but to our knowledge previously unstudied, variations on common binary distances. The simulations suggest that ADD outperforms other distances on Dollo data. Interestingly, we found that the LogDet distance performs poorly in the context of a Dollo process, which may have implications for its use in connection with conditioned genome reconstruction. We apply…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Genetic diversity and population structure · Banana Cultivation and Research
