Combinatorial and computational investigations of Neighbor-Joining bias

Ruth Davidson; Abraham Martin del Campo

arXiv:2007.09345·math.CO·September 18, 2020

Combinatorial and computational investigations of Neighbor-Joining bias

Ruth Davidson, Abraham Martin del Campo

PDF

TL;DR

This paper explores the geometric and combinatorial structure of the Neighbor-Joining algorithm's output space, revealing biases and providing formulas for the number of regions based on taxa count.

Contribution

It introduces agglomeration orders and a bijection with Motzkin paths, offering a full combinatorial description of the algorithm's output space and analyzing biases.

Findings

01

Number of polyhedral regions depends only on the number of taxa

02

Established a bijection between regions and weighted Motzkin paths

03

Provided a formula for counting these regions

Abstract

The Neighbor-Joining algorithm is a popular distance-based phylogenetic method that computes a tree metric from a dissimilarity map arising from biological data. Realizing dissimilarity maps as points in Euclidean space, the algorithm partitions the input space into polyhedral regions indexed by the combinatorial type of the trees returned. A full combinatorial description of these regions has not been found yet; different sequences of Neighbor-Joining agglomeration events can produce the same combinatorial tree, therefore associating multiple geometric regions to the same algorithmic output. We resolve this confusion by defining agglomeration orders on trees, leading to a bijection between distinct regions of the output space and weighted Motzkin paths. As a result, we give a formula for the number of polyhedral regions depending only on the number of taxa. We conclude with a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.