Optimal Phylogenetic Reconstruction from Sampled Quartets
Dionysis Arvanitakis, Vaggos Chatziafratis, Yiyuan Luo, Konstantin Makarychev

TL;DR
This paper establishes the optimal sample complexity for phylogenetic tree reconstruction from quartets, providing an efficient algorithm that recovers trees close to the ground truth with near-linear samples.
Contribution
It introduces an optimal sample complexity bound of a9(n) for tree reconstruction from quartets and presents a novel algorithm with theoretical guarantees.
Findings
Reconstruction from a9(n) quartets is information-theoretically optimal.
The proposed algorithm achieves a tree close to the true tree in quartet distance.
A new a9(n) bound on the Natarajan dimension of phylogenies was established.
Abstract
Quartet Reconstruction, the task of recovering a phylogenetic tree from smaller trees on four species called \textit{quartets}, is a well-studied problem in theoretical computer science with far-reaching connections to statistics, graph theory and biology. Given a random sample containing noisy quartets, labeled by an unknown ground-truth tree on taxa, we want to output a tree that is \textit{close} to in terms of quartet distance and can predict unseen quartets. Unfortunately, the empirical risk minimizer corresponds to the -hard problem of finding a tree that maximizes agreements with the sampled quartets, and earlier works in approximation algorithms gave -approximation schemes (PTAS) for dense instances with quartets, or for quartets \textit{randomly} sampled from . Prior to our work, it…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
