Modeling the distribution of distance data in Euclidean space
Ruth Davidson, Joseph Rusinko, Zoe Vernon, and Jing Xi

TL;DR
This paper models how DNA sequence-derived distance data are distributed in Euclidean space relative to the space of tree metrics, aiding understanding of phylogenetic inference algorithms.
Contribution
It introduces a model for the distribution of distance data points in Euclidean space concerning the space of tree metrics, enhancing analysis of phylogenetic inference methods.
Findings
Distribution of data points relative to polyhedral cones
Insights into the geometry of phylogenetic inference space
Potential improvements in inference accuracy
Abstract
Phylogenetic inference-the derivation of a hypothesis for the common evolutionary history of a group of species- is an active area of research at the intersection of biology, computer science, mathematics, and statistics. One assumes the data contains a phylogenetic signal that will be recovered with varying accuracy due to the quality of the method used, and the quality of the data. The input for distance-based inference methods is an element of a Euclidean space with coordinates indexed by the pairs of organisms. For several algorithms there exists a subdivision of this space into polyhedral cones such that inputs in the same cone return the same tree topology. The geometry of these cones has been used to analyze the inference algorithms. In this chapter, we model how input data points drawn from DNA sequences are distributed throughout Euclidean space in relation to the space of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Evolution and Paleontology Studies · Morphological variations and asymmetry
