Computing the Distribution of a Tree Metric
David Bryant, Mike Steel

TL;DR
This paper presents a polynomial-time algorithm to compute the distribution of Robinson-Foulds tree distances, along with an approximation method and applications to supertree construction.
Contribution
It introduces the first explicit polynomial-time algorithm for the RF distance distribution and offers a Poisson approximation method for practical use.
Findings
Derived a polynomial-time algorithm for RF distance distribution
Proposed a Poisson approximation based on tree cherries
Applied results to normalize constants in supertree methods
Abstract
The Robinson-Foulds (RF) distance is by far the most widely used measure of dissimilarity between trees. Although the distribution of these distances has been investigated for twenty years, an algorithm that is explicitly polynomial time has yet to be described for computing this distribution (which is also the distribution of trees around a given tree under the popular Robinson-Foulds metric). In this paper we derive a polynomial-time algorithm for this distribution. We show how the distribution can be approximated by a Poisson distribution determined by the proportion of leaves that lie in `cherries' of the given tree. We also describe how our results can be used to derive normalization constants that are required in a recently-proposed maximum likelihood approach to supertree construction.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Data Mining Algorithms and Applications · Data Management and Algorithms
