Information Content of a Phylogenetic Tree in a Data Matrix
Tania Roy, Hsieh Fushing, Xunde Li, Brenda McCowan, Rob Atwill

TL;DR
This paper investigates whether the binary structure of phylogenetic trees is data-driven or man-made, proposing a new data-centric tree model called DCG that captures authentic data structure through a probabilistic, ultrametric framework.
Contribution
The paper introduces the Data Cloud Geometry (DCG) tree model, revealing the true data-driven structure of phylogenetic trees and contrasting it with traditional hierarchical clustering methods.
Findings
DCG tree captures authentic data structure based on probabilistic clustering.
Traditional hierarchical clustering imposes an ad hoc distance measure, unlike DCG.
DCG tree exhibits ultrametric properties, differentiating it from conventional methods.
Abstract
Phylogenetic trees in genetics and biology in general are all binary. We make an attempt to answer one fundamental question: Is such binary branching from the coarsest to the finest scales sustained by data? We convert this question into an equivalent one: where is the structural information of tree in a data matrix? Results from this conceptual as well as computing issue afford us to conclude a negative answer: Each branch being split into two at each inter-node of tree from the top to bottom levels is a man-made structure. The data-driven computing paradigm Data Mechanics is employed here to reveal that information of tree is composed of a set of selected temperatures (or scales), each of which has a clustering composition strictly regulated by a temperature-specific cluster-sharing probability matrix. The resultant Data Cloud Geometry (DCG) tree on the space of species is proposed as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFractal and DNA sequence analysis · Genomics and Phylogenetic Studies · Biomedical Text Mining and Ontologies
