Bandwidth Selection of Density Estimators over Treespaces
Ruriko Yoshida, Zhiwen Wang

TL;DR
This paper introduces a likelihood cross validation method for selecting the optimal bandwidth in tropical kernel density estimators over phylogenetic tree spaces, improving accuracy and efficiency.
Contribution
It provides an explicit solution for the optimal bandwidth using LCV and demonstrates its effectiveness through simulations and real data application.
Findings
LCV yields better bandwidth selection than nearest neighbors.
Tropical KDE with LCV outperforms in accuracy and computational time.
Method successfully applied to empirical genomic data.
Abstract
A kernel density estimator (KDE) is one of the most popular non-parametric density estimators. In this paper we focus on a best bandwidth selection method for use in an analogue of a classical KDE using the tropical symmetric distance, known as a tropical KDE, for use over the space of phylogenetic trees. We propose the likelihood cross validation (LCV) for selecting the bandwidth parameter for the KDE over the space of phylogenetic trees. In this paper, first, we show the explicit optimal solution of the best-fit bandwidth parameter via the LCV for tropical KDE over the space of phylogenetic trees. Then, computational experiments with simulated datasets generated under the multi-species coalescent (MSC) model show that a tropical KDE with the best-fit bandwidth parameter via the LCV perform better than a tropical KDE with an estimated best-fit bandwidth parameter via nearest neighbors…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Census and Population Estimation · Evolution and Paleontology Studies
