Combinatorial and computational investigations of Neighbor-Joining bias
Ruth Davidson, Abraham Martin del Campo

TL;DR
This paper explores the geometric and combinatorial structure of the Neighbor-Joining algorithm's output space, revealing biases and providing formulas for the number of regions based on taxa count.
Contribution
It introduces agglomeration orders and a bijection with Motzkin paths, offering a full combinatorial description of the algorithm's output space and analyzing biases.
Findings
Number of polyhedral regions depends only on the number of taxa
Established a bijection between regions and weighted Motzkin paths
Provided a formula for counting these regions
Abstract
The Neighbor-Joining algorithm is a popular distance-based phylogenetic method that computes a tree metric from a dissimilarity map arising from biological data. Realizing dissimilarity maps as points in Euclidean space, the algorithm partitions the input space into polyhedral regions indexed by the combinatorial type of the trees returned. A full combinatorial description of these regions has not been found yet; different sequences of Neighbor-Joining agglomeration events can produce the same combinatorial tree, therefore associating multiple geometric regions to the same algorithmic output. We resolve this confusion by defining agglomeration orders on trees, leading to a bijection between distinct regions of the output space and weighted Motzkin paths. As a result, we give a formula for the number of polyhedral regions depending only on the number of taxa. We conclude with a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
